<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Beyond Data</title>
	<atom:link href="http://blog.vasukikasturi.com/category/bi/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.vasukikasturi.com</link>
	<description>Tempered thoughts on Enterprise Data Management</description>
	<lastBuildDate>Sat, 20 Mar 2010 19:36:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>Root Cause for Data Quality Issues</title>
		<link>http://blog.vasukikasturi.com/data-quality/root-cause-data-quality-issues</link>
		<comments>http://blog.vasukikasturi.com/data-quality/root-cause-data-quality-issues#comments</comments>
		<pubDate>Mon, 15 Mar 2010 01:20:49 +0000</pubDate>
		<dc:creator>Vasuki Kasturi</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Data Entry]]></category>
		<category><![CDATA[data governance]]></category>
		<category><![CDATA[Data Integration]]></category>
		<category><![CDATA[master data management]]></category>

		<guid isPermaLink="false">http://blog.vasukikasturi.com/?p=243</guid>
		<description><![CDATA[Data is impacted by numerous processes that bring data into your data environment, most of which affect its quality to some extent. Some processes bring data into your environment, referred as the inflow, and some process operate on the data causing data issues. The fishbone below highlight the different causes for data to decay (in [...]]]></description>
			<content:encoded><![CDATA[<p>Data is impacted by numerous processes that bring data into your data environment, most of which affect its quality to some extent. Some processes bring data into your environment, referred as the inflow, and some process operate on the data causing data issues. The fishbone below highlight the different causes for data to decay (in no set order).</p>
<div id="attachment_256" class="wp-caption alignnone" style="width: 630px"><a href="http://blog.vasukikasturi.com/wp-content/uploads/2010/03/bad_dq.jpg"><img class="size-full wp-image-256 " title="bad_dq" src="http://blog.vasukikasturi.com/wp-content/uploads/2010/03/bad_dq.jpg" alt="Root Cause for Data Quality Issues" width="620" height="410" /></a><p class="wp-caption-text">Root Cause for Data Quality Issues</p></div>
<p>It is difficult to prioritize this list, although philosophically I can  say that lack of governance will most definitely lead to bad data. At  the same time, the list is not finite or complete. Organizational events  like mergers &amp; consolidations can also lead to bad data quality. The fins on the upper side are processes that bring data into your system, the inflow. The lower fins are internal processes that cause bad data to persist. Either of the fins can cause data corruptions. I have summarized each of the fin below, without bloating this post.</p>
<ol>
<li>Legacy Migration: Refers to data that is often migrated from a legacy system. In most cases the data structures and data models are inconsistent between the legacy architecture and the new architecture.</li>
<li>System Migration: This is almost similar to the above, except that these are due system upgrades. As applications evolve, designs change. New fields get added, when no historical data exists for this field. Or god forbid, some fields are deprecated/removed which may lead to serious problems.</li>
<li>Workarounds: This is typical of the business community and packaged applications (ERP/CRM et al). Custom fields are heavily used (often with no documentation), which later lead to some misinterpretations.</li>
<li>Manual Data Entry: Mostly happens when systems collect data from users via a &#8220;free text&#8221; field. Common examples include Addresses, Phone Numbers etc. In the absence of standards/conventions, or lack of policies, data entry users would want to finish a transaction as quickly as possible rather than worry about the accuracy of the data. If the system is not self correcting, users will never understand that they are introducing bad data.</li>
<li>Interfaces: These are the connectors between one system to another. For large enterprises, this is how data typically flows &#8211; Campaigns to Opportunity to Quote to Order to Manufacturing to Service. To compound the matters, each system is sold/supported by a different vendor with no accountability for data correction.</li>
<li>Process Automation: This is different from the Interface issue discussed above. This is more about how a system process (within that system) is automated. As existing business processes are re-engineered (due to dynamic nature of the business), applications get out of sync or new data assumptions are made. If this is not relayed to the IT team that supports the system, there will be some data corruption.</li>
<li>Time Decay: This is especially true for Master data (like customer), where the data was good at some point in time but has since been not updated. Consider your email address (specifically work emails), as customer contacts move from one organization to another their email changes with the move. The data you once had for this customer contact is no longer accurate.</li>
<li>Data Quality Programs: The irony. Yes, sometimes the data quality programs/initiatives are themselves a cause for bad data. This is mostly because of wrong assumptions on business data and rules around the data. So data may be cleansed incorrectly, aggressive merges (in the case of Master Data Management), data purges etc.</li>
<li>Lack of Ownership:  Very few organizations have complete ownership of a system (a CRM is often shared by Sales and Marketing), they often share sections of the data. With shared ownership, comes conflicting business rules and priorities. Concepts like Data Quality Organizations or Data Stewards are new to most organization, which bring accountability to an enterprise.</li>
<li>Lack of Governance: Data Governance is a vast discipline that is beyond the scope of this post. It is about arriving at a standard definitions the the common data, via meta data management. It is about analyzing, defining and base lining the current quality of the data; so some of the quality metrics can be monitored. Its about MDM and a lot more. Lack of governance, means the information management strategy is poorly executed leading to more data issues.</li>
</ol>
<p>In summary, the reasons for bad data quality are many. Before we start looking at cleaning the data, it is prudent that we understand the root causes for bad data. Prioritize and strategize the cleanup activities; devise ongoing monitors to gauge the data and control the inflow of bad data.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vasukikasturi.com/data-quality/root-cause-data-quality-issues/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Impact of Bad Data</title>
		<link>http://blog.vasukikasturi.com/data-quality/impact-of-bad-data</link>
		<comments>http://blog.vasukikasturi.com/data-quality/impact-of-bad-data#comments</comments>
		<pubDate>Sat, 27 Feb 2010 23:33:54 +0000</pubDate>
		<dc:creator>Vasuki Kasturi</dc:creator>
				<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://blog.vasukikasturi.com/?p=202</guid>
		<description><![CDATA[In this Information age, data and information are vital to an organization’s success. And that vitality is created with a fresh supply of clean and unpolluted customer data. The Data Warehousing Institute estimates that data quality problems cost U.S. businesses more than $600 billion a year. And they weren&#8217;t just talking of the unnecessary printing, [...]]]></description>
			<content:encoded><![CDATA[<p>In this Information age, data and information are vital to an organization’s success. And that vitality is created with a fresh supply of clean and unpolluted customer data. <a href="http://www.tdwi.org" target="_blank">The Data Warehousing Institute</a> estimates that <a class="zem_slink" title="Data quality" rel="wikipedia" href="http://en.wikipedia.org/wiki/Data_quality">data quality</a> problems cost U.S. businesses more than $600 billion a year. And they weren&#8217;t just talking of the unnecessary printing, postage, and staffing costs associated with bad data. When organizations have no grip on the quality of their data, over time the confidence amongst their customer and partner community erodes.</p>
<p>Organizations today use data to generate a multiplicity of information assets (campaigns, operational systems, reports, dashboard etc). These assets form the basis for any strategic action the organization may take.  So when the incoming data is bad, all the downstream systems and assets are contaminated thereby jeopardizing the success of the organization. The impact of bad data has been quantified by several vendors and consulting organizations. Below are some facts and figures around the impact of bad quality (anecdotal &amp; quantified).</p>
<ul>
<li>If customer preferences (Opt outs) is not maintained accurately, enterprises have to fork some serious penalties that increase with each incident.</li>
<li>The ratio of the cost to process a transaction when data is clean and what it is accurate is 1:10. Organizations make millions of transactions a year.</li>
<li>Data Integration and BI projects either fail or delayed because of bad data.</li>
<li>Inaccurate medical diagnosis can sometimes be fatal.</li>
<li>Lack of single version of truth (for master data) results in additional spend and/or bad customer service.</li>
</ul>
<p>To summarize, Data Quality impacts range from a pure transaction level loss up to catastrophic impact for an enterprise. In the words of Larry English,  the cost of bad data may be 10-25 percent of the company&#8217;s total revenues. Marketing folks know the cost of customer acquisition, and the renewal potential of each customer on file. Once an organization loses its loyal customer, all the associated revenue potential goes down the drain. So what exactly are the reasons for poor Customer Data Quali<img src="file:///C:/Users/vasuki/AppData/Local/Temp/moz-screenshot-4.png" alt="" />ty? I shall cover them in a later post, for now here is the result from the TDWI Data Quality Survey on this question.</p>
<div id="attachment_207" class="wp-caption alignnone" style="width: 310px"><a href="http://blog.vasukikasturi.com/wp-content/uploads/2010/02/SourcesOfDQProblems.jpg"><img class="size-medium wp-image-207" title="SourcesOfDQProblems" src="http://blog.vasukikasturi.com/wp-content/uploads/2010/02/SourcesOfDQProblems-300x150.jpg" alt="" width="300" height="150" /></a><p class="wp-caption-text">Sources of Data Quality Problems</p></div>
<p><img src="file:///C:/Users/vasuki/AppData/Local/Temp/moz-screenshot-3.png" alt="" /></p>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><span class="zem-script more-related pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.vasukikasturi.com/data-quality/impact-of-bad-data/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Quality &#8211; Definition</title>
		<link>http://blog.vasukikasturi.com/data-quality/data-quality</link>
		<comments>http://blog.vasukikasturi.com/data-quality/data-quality#comments</comments>
		<pubDate>Wed, 24 Feb 2010 07:43:20 +0000</pubDate>
		<dc:creator>Vasuki Kasturi</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[data governance]]></category>
		<category><![CDATA[Data Profiling]]></category>

		<guid isPermaLink="false">http://blog.vasukikasturi.com/?p=228</guid>
		<description><![CDATA[What is Data Quality? The answer depends on who you ask. And rightly so, because the same data is weighted differently by different users. However in each case, if the data (that is important for a department/group) is fit for intended purposes then it is said to be of good data quality.]]></description>
			<content:encoded><![CDATA[<p>If you were to do a search on Wikipedia for the term &#8220;<a href="http://en.wikipedia.org/wiki/Data_quality" target="_blank">Data Quality</a>&#8220;, you would get varied definitions. Rightfully so. My experience says it is about &#8220;fitness of use&#8221;. Data in its raw form is as useless as the binary code; it is the information derived from the data that is the nugget here. Information is used in many systems across the enterprise &#8211; from operation systems, to BI frameworks to the KPIs on the dashboard. So when we talk of Data Quality, we are talking about how &#8220;useful&#8221; is my data so I can extract valuable information via these systems.</p>
<p>Each system has its own constituents of users, and thus each have their own interpretation  of what &#8220;useful&#8221; data is. For the IT organization implementing a BI system, data is useful if it is &#8220;Complete&#8221;. Yet the same data for a Sales organization is useful if it is &#8220;Accurate&#8221;. For a credit card collection agency, this data is useful if its &#8220;Timely&#8221;. (It is easily conceivable that a department is interested in Complete, Accurate and Timely data). Thus we have many dimensions where data can be assessed being good (or bad). The science of assessing how good is data, for use by a department/enterprise, is usually referred to as Data Discovery.</p>
<p>Before we start on a trip to discovering data, it is important to understand why such a process is needed. Why is it important to know how bad is the customer data quality? In other words, there has to be a business case for Data Quality Initiatives. A business case usually takes the form of &#8220;Lost revenues  due to missed shipments&#8221; (incomplete/inaccurate address information), &#8220;Inability to up-sell/cross-sell into existing base&#8221; (no contact data) et al. If you have ever run a marketing campaign (say email), you would know the failure rates of these campaigns and how much of it is attributed to bad data. This is anecdotal evidence. So the objective of Data Discovery is to profile the data, so that the anecdotal evidence can be quantified in a scientific manner.</p>
<p>Like most things, too much of Data Quality comes at a price. And it is so because, Data Quality works on the law of diminishing returns. Once the most offending processes have been fixed/cured, it may cost more to fix other processes which may not yield much value. This is where Data Governance plays in nicely. Without getting deep into the details, <a class="zem_slink" title="Data governance" rel="wikipedia" href="http://en.wikipedia.org/wiki/Data_governance">Data governance</a> is a set of processes that ensures that important data assets are formally managed throughout the enterprise. So what to measure and how much to measure is dictated by the Data governance board.</p>
<p>To summarize, Data Quality is about &#8220;fitness of use&#8221; that needs to be measured across many &#8220;dimensions&#8221;. Data discovery or profiling needs to be applied to understand how bad the data symptoms are. Data governance (besides a lot of things) defines the boundaries for the data quality initiative.</p>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><span class="zem-script more-related pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.vasukikasturi.com/data-quality/data-quality/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>10 mistakes to avoid for data warehouse project managers</title>
		<link>http://blog.vasukikasturi.com/bi/10-mistakes-to-avoid-for-data-warehouse-project-managers</link>
		<comments>http://blog.vasukikasturi.com/bi/10-mistakes-to-avoid-for-data-warehouse-project-managers#comments</comments>
		<pubDate>Thu, 29 Mar 2007 05:23:32 +0000</pubDate>
		<dc:creator>Vasuki Kasturi</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Data-Marts]]></category>
		<category><![CDATA[Data-Warehouse]]></category>

		<guid isPermaLink="false">http://blog.vasukikasturi.com/2007/03/29/10-mistakes-to-avoid-for-data-warehouse-project-managers/</guid>
		<description><![CDATA[This is an excerpt of the complete article on TDWI. The article, with the same title, is for members only. Hope you enjoy. Mistake 1. Failing to Use a Methodology Mistake 2. Ineffective Project Team Structure Mistake 3. Failing to Involve the Business People Mistake 4. Failing to Have Application Releases Mistake 5. Failing to [...]]]></description>
			<content:encoded><![CDATA[<p>This is an excerpt of the complete article on <a href="http://www.tdwi.org" target="_blank">TDWI</a>. The <a href="http://www.tdwi.org/Publications/display.aspx?Id=7545" target="_blank">article</a>, with the same title, is for members only. Hope you enjoy.</p>
<p><strong>Mistake 1. Failing to Use a Methodology<br />
</strong><strong>Mistake 2. Ineffective Project Team Structure<br />
</strong><strong>Mistake 3. Failing to Involve the Business People<br />
</strong><strong>Mistake 4. Failing to Have Application Releases<br />
</strong><strong>Mistake 5. Failing to Have an Active Project Charter<br />
</strong><strong>Mistake 6. Lack of a Readiness Assessment<br />
</strong><strong>Mistake 7. Inadequate Testing<br />
</strong><strong>Mistake 8. Underestimating Data Cleansing Efforts<br />
</strong><strong>Mistake 9. Ignoring Metadata<br />
</strong><strong>Mistake 10. Being a Slave to Project Management Tools</strong></p>
<p>Technorati Tags: <a href="http://technorati.com/tag/Data+Warehouse" rel="tag">Data Warehouse</a>, <a href="http://technorati.com/tag/Project+Managers" rel="tag"> Project Managers</a>, <a href="http://technorati.com/tag/Data+Marts" rel="tag"> Data Marts</a>, <a href="http://technorati.com/tag/Business+Intelligence" rel="tag"> Business Intelligence</a>, <a href="http://technorati.com/tag/Mistakes" rel="tag"> Mistakes</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vasukikasturi.com/bi/10-mistakes-to-avoid-for-data-warehouse-project-managers/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Open Source &amp; Business Intelligence</title>
		<link>http://blog.vasukikasturi.com/bi/open-source-business-intelligence</link>
		<comments>http://blog.vasukikasturi.com/bi/open-source-business-intelligence#comments</comments>
		<pubDate>Thu, 23 Jun 2005 05:22:00 +0000</pubDate>
		<dc:creator>Vasuki Kasturi</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Data Integration]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://blog.vasukikasturi.com/?p=4</guid>
		<description><![CDATA[Business Intelligence helps us understand the details about customers that are crucial to the success of any organization. Traditionally these offerings are very expensive due to licensing fees for the several components. Of late, some momentum has been seen around Open Source and maturity of projects suitable for BI. But we have only heard of [...]]]></description>
			<content:encoded><![CDATA[<p>Business Intelligence helps us understand the details about customers that are crucial to the success of any organization. Traditionally these offerings are very expensive due to licensing fees for the several components. Of late, some momentum has been seen around Open Source and maturity of projects suitable for BI. But we have only heard of the adoption of Open Source in the Infrastructure layer &#8211; OS, database, App Servers etc. Very little attention has been paid to the software required to build and deliver Business Intelligence. That is, until now.</p>
<p>Lets take a subset of some of the BI offerings, and see how Open Source solutions match up against them.</p>
<ol>
<li>Databases: <a title="PostGres" href="http://www.postgresql.org/">PostGres (or Bizgres)</a> and <a title="mysql" href="http://www.mysql.com/">MySQL.</a></li>
<li>ETL: <a title="Octopus" href="http://octopus.objectweb.org/">Enhydra Octopus</a> and <a title="Clover" href="http://cloveretl.berlios.de/">Clover</a></li>
<li>OLAP: <a title="Mondrian" href="http://mondrian.sourceforge.net/">Mondrian</a> and <a title="GreenPlum" href="http://www.greenplum.com/">GreenPlum</a></li>
<li>Dashboards (Portal approach): <a title="JBoss" href="http://www.jboss.org/products/jbossportal">JBoss Portal</a>, <a title="JetSpeed" href="http://portals.apache.org/jetspeed-1/">JetSpeed</a>, LifeRay and Gluecode (now owned by IBM.</li>
<li>Reporting: <a title="Jasper" href="http://www.jaspersoft.com/jasper_report.php">Jasper Reports</a>, <a title="BIRT" href="http://www.eclipse.org/birt/">BIRT</a></li>
</ol>
<p>If this doesn&#8217;t sound promising, <a title="Pentaho" href="http://www.pentaho.org/">Pentaho </a>is planning a Open Source BI distribution that brings all of the above together and some more. This sounds very exciting for myself and for my <a title="Cignex" href="http://www.cignex.com">company</a>, which has been providing Open Source solutions for almost 5 years.<br /><p>Technorati Tags: <a href="http://technorati.com/tag/Pentaho" rel="tag"> Pentaho</a>, <a href="http://technorati.com/tag/JasperReports" rel="tag"> JasperReports</a>, <a href="http://technorati.com/tag/MySQL" rel="tag"> MySQL</a>, <a href="http://technorati.com/tag/JBoss" rel="tag"> JBoss</a>, <a href="http://technorati.com/tag/Portal" rel="tag"> Portal</a>, <a href="http://technorati.com/tag/Business+Intelligence" rel="tag"> Business Intelligence</a>, <a href="http://technorati.com/tag/BI" rel="tag"> BI</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vasukikasturi.com/bi/open-source-business-intelligence/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

