Sunday, August 30, 2009

What price SITP data quality

More lessons from CRM - CRM systems like SITP systems are largely oriented at making decisions (rather that supporting transactions) i.e. more shameless plagiarism (or learning from more mature disciplines as a I prefer to think of it as)

See the source reference here: http://www.computerworld.com/s/article/9135688/What_Price_CRM_Data_Quality_?source=CTWNLE_nlt_entsoft_2009-07-23

In an SITP the data does not have to be perfect to be useful. Many things can be just an approximation or can be missing altogether. So how do you decide where to make your data quality investment?

The first step is separate the data elements (objects and relationships and their properties) into three categories the ones that:
(1) must be there and must be correct to prevent corruption in external systems or misrepresentation of the business.
(2) should be correct for the SITO to work at all
(3) people have asked for to make decisions or record the state of the enterprise

A data quality analysis on each data element in the three categories can be scored based on questions such as:
- Ownership - Does it have an undisputed owner, or is it updated by specific roles as part of a formal business process, or can nearly anyone update it in an ad-hoc fashion.
- Validatity and auditability - Does it have validation on entry and does an audit trail track changes.
- Completeness and correctness - how complete is the source data, how much is missing, clearly incorrect, or duplicate.
Developing an understanding of this metadata is a 1st step that needs to be taken when looking at data preparation and loading.

Data element in an SITP system is in there because it was required to answer a specific business questions and/or because the SITP system is the source of record for this data.

Analysis will discover what data elements that are often missing or wrong. Depending on what concepts are being dealt with the data that is often found missing depends changes e.g.
- Business goals and strategies - will often lack: weighting and explicit relationships
- Applications - will often lack: a range of cost information, related standards and business processes
- Standards - will often lack: lifecycles, basis of current preferences, current and planned usage

Historically where there is no SITP solution it is difficult to collect some of the data in the first place, there are many ways for the meaning of the data to be misinterpreted or misrepresented, and there is often no easy way of usefully applying it (i.e. its recorded as academic exercise and not tested through the fire of use).

In many cases, you can't afford to spend too much on data quality before the system is implemented. What you need to understand is the metadata. If the business question needing this data is materially affected by the quality of this data you will need to carefully assess the cost of remediating the data with the impact and improvement in its quality with have on the decision that are being made base on it.

You can set business rules that allow you to determine when it's worth chasing a data element's quality and when it isn't.

It gets exponentially more expensive to improve data quality. If it costs $X to get solid data quality on 2/3rds of your records, it will probably cost $2X to get the data quality right on half of the remaining records, and $4X to get the half of what bis then left.

For most purposes it is hard to justify a lot of remediation on historical data, and one is better to invest the effort time in improving the way data is captured on an ongoing basis. In many cases it is sufficient to start capturing data on a forward on a JIT basis, or as data is used. Over a short space of time i.e. a year a two most of the data needed for SITP will accrete as natural by-product of improved processes (of course adapting the processes to capitalise on the SITP is a key step to enabling this).

No comments:

Post a Comment