In today’s information-driven world, implementing an effective
data quality management or DQM strategy cannot be overlooked. DQM refers to a
business principle that requires a combination of the right people, processes
and technologies all with the common goal of improving the measures of data
quality.
The subject is the single most important concept in the modern
data quality approach. The subject is the entity which will be the target of
the data quality investigation at the most granular level. Before we begin any
data quality initiative we must discover what the subject of the study is. Like
most concepts in our approach, the subject is a concept reflected in the data
but not attached to any Technical object.
For ex: Employee Status, Hours, Earnings belongs to subject
"Employee". If we implement a Telecom Data warehouse, subject areas
can be Subscriber, Finance, Marketing.Once identified, the subject becomes more
than a concept and will define the granularity with which you will measure data
quality.
“We identified data quality issues with 20 percent of the
subscribers contained in our database” is a more useful statement than
“Thirty-eight percent of the rows in the SUBSCRIBER table have a field that
fails one of our data criteria.”
It is not the right way to define a business rule by
programming, creating SQL statements that grabs Bad data. Instead create a
business rule that can be expressed in simple sentence, agree and program them.
Programming is one property of the business rule. The rule should be
independent of ties to a database, table, field. these associations come later.
Each business rule must be designed and understood by the entire team
Building strong business rules can be effectively done by SMEs.
Because they are most familiar with the data and who knows its history, linage,
problems, and nature of the data.
A Data Quality (DQ) Dimension is a recognized term
used by data management professionals to describe a feature of data that can be
measured or assessed against defined standards in order to determine the
quality of data.
Completeness – a percentage of data that includes one or more values. It’s
important that critical data (such as customer names, phone numbers, email
addresses, etc.) to be complete and accurate.
Uniqueness – When measured against other data sets, there is only one entry
of its kind.
Timeliness – How much of an impact does date and time have on the data?
This could be previous sales, product launches or any information that is
relied on over a period of time to be accurate.
Validity – Does the data conform to the respective standards set
for it?
Accuracy – How well does the data reflect the real-world person
or thing that is identified by it?
Consistency – How well does the data align with a preconceived
pattern? Birth dates share a common consistency issue, since in the U.S., the
standard is MM/DD/YYYY, whereas in Europe and other areas, the usage of
DD/MM/YYYY is standard.
A typical Data Quality Measurement approach might be:
1. Identify which data items need to be assessed for data quality, typically
this will be the Subject areas critical to the business
operations and associated management reporting.
2. Assess which data quality dimensions to use and their associated weighting
3. For each data quality dimension, define values or
ranges representing good and bad quality data. Please note, that as a
data set may support multiple requirements, a number of different data quality
assessments may need to be performed
4.
Apply the assessment criteria to the data items
5.
Review the results and determine if data quality is acceptable or not
6. Where appropriate take corrective actions e.g.
clean the data and improve data handling processes to prevent future
recurrences
7.
Repeat the above on a periodic basis to monitor trends in Data Quality.
References: https://tdwi.org, www.whitepapers.em360tech.com
Comments
Post a Comment
Your Comments are more valuable to improve. Please go ahead