Beyond Data

Tempered thoughts on Enterprise Data Management

Browsing Posts published in February, 2010

In this Information age, data and information are vital to an organization’s success. And that vitality is created with a fresh supply of clean and unpolluted customer data. The Data Warehousing Institute estimates that data quality problems cost U.S. businesses more than $600 billion a year. And they weren’t just talking of the unnecessary printing, postage, and staffing costs associated with bad data. When organizations have no grip on the quality of their data, over time the confidence amongst their customer and partner community erodes.

Organizations today use data to generate a multiplicity of information assets (campaigns, operational systems, reports, dashboard etc). These assets form the basis for any strategic action the organization may take.  So when the incoming data is bad, all the downstream systems and assets are contaminated thereby jeopardizing the success of the organization. The impact of bad data has been quantified by several vendors and consulting organizations. Below are some facts and figures around the impact of bad quality (anecdotal & quantified).

  • If customer preferences (Opt outs) is not maintained accurately, enterprises have to fork some serious penalties that increase with each incident.
  • The ratio of the cost to process a transaction when data is clean and what it is accurate is 1:10. Organizations make millions of transactions a year.
  • Data Integration and BI projects either fail or delayed because of bad data.
  • Inaccurate medical diagnosis can sometimes be fatal.
  • Lack of single version of truth (for master data) results in additional spend and/or bad customer service.

To summarize, Data Quality impacts range from a pure transaction level loss up to catastrophic impact for an enterprise. In the words of Larry English,  the cost of bad data may be 10-25 percent of the company’s total revenues. Marketing folks know the cost of customer acquisition, and the renewal potential of each customer on file. Once an organization loses its loyal customer, all the associated revenue potential goes down the drain. So what exactly are the reasons for poor Customer Data Quality? I shall cover them in a later post, for now here is the result from the TDWI Data Quality Survey on this question.

Sources of Data Quality Problems

If you were to do a search on Wikipedia for the term “Data Quality“, you would get varied definitions. Rightfully so. My experience says it is about “fitness of use”. Data in its raw form is as useless as the binary code; it is the information derived from the data that is the nugget here. Information is used in many systems across the enterprise – from operation systems, to BI frameworks to the KPIs on the dashboard. So when we talk of Data Quality, we are talking about how “useful” is my data so I can extract valuable information via these systems.

Each system has its own constituents of users, and thus each have their own interpretation  of what “useful” data is. For the IT organization implementing a BI system, data is useful if it is “Complete”. Yet the same data for a Sales organization is useful if it is “Accurate”. For a credit card collection agency, this data is useful if its “Timely”. (It is easily conceivable that a department is interested in Complete, Accurate and Timely data). Thus we have many dimensions where data can be assessed being good (or bad). The science of assessing how good is data, for use by a department/enterprise, is usually referred to as Data Discovery.

Before we start on a trip to discovering data, it is important to understand why such a process is needed. Why is it important to know how bad is the customer data quality? In other words, there has to be a business case for Data Quality Initiatives. A business case usually takes the form of “Lost revenues  due to missed shipments” (incomplete/inaccurate address information), “Inability to up-sell/cross-sell into existing base” (no contact data) et al. If you have ever run a marketing campaign (say email), you would know the failure rates of these campaigns and how much of it is attributed to bad data. This is anecdotal evidence. So the objective of Data Discovery is to profile the data, so that the anecdotal evidence can be quantified in a scientific manner.

Like most things, too much of Data Quality comes at a price. And it is so because, Data Quality works on the law of diminishing returns. Once the most offending processes have been fixed/cured, it may cost more to fix other processes which may not yield much value. This is where Data Governance plays in nicely. Without getting deep into the details, Data governance is a set of processes that ensures that important data assets are formally managed throughout the enterprise. So what to measure and how much to measure is dictated by the Data governance board.

To summarize, Data Quality is about “fitness of use” that needs to be measured across many “dimensions”. Data discovery or profiling needs to be applied to understand how bad the data symptoms are. Data governance (besides a lot of things) defines the boundaries for the data quality initiative.