Instead of defining the validations for each field (column) in a database, it is better to define global domains/standards, and to link a field (like age) OR entity (like address) to these global reference points. This allows for consistency as well as standardization across tables/databases/systems.
Valid Data Domain Value Rules & standards for Data Quality Assurance
Every column has to have the permissible values. Following are the typical definitions of permissible values:
Age can be 0 to 100, OR height can be from 1 ft to 8 ft, OR the premium can be from USD 100 to 10 Million.
Discrete Value List
The agent status can be 'Active', 'Inactive', 'Terminated', 'Suspended' OR the patient status is 'Registered', 'Diagnosed','Discharged'..
Skip over Rules
This covers the values, which a column cannot have. example- Job work acceptance date cannot be a holiday.
Text Column Rules
These generally come out, when we have multiple entries in a single column separated by the allowed separators, OR if it includes special characters.
Pick up any account code, credit card number, inventory code, customer ID you will find that they are combinations of embedded codes. For example a car registration number in a given country can have the state code, the issuing zone code, the serial alphabet and the running serial number, with all these individual components concatenated.
Null/optional value Assurance
Null concept is deceptive and is perhaps one of the biggest traps of Data Quality. Allow a field to be Null/optional , and it gives an escape route for the data providers to skip it. A field may not have a value due to many reasons and one should be providing clear distinction on the same. The different scenarios and their treatment are:
- Value does not exist: put Genuine Null.
- Value exists, but not known at the moment: write 'Unknown'
- Value is not applicable: write 'Not applicable'
- Value is not valid: Value exists, but is not as per the value rule defined for the column.
Domain Value and mapping assurance
'Domains' are the defined business objects, which have a standard set of characteristics. If a column is supposed to be belonging to a given domain it should be having at least the properties of that domain along with some additional properties, if specified. For example,
Date can be a domain, which has mm-dd-yyyy format, and has the valid value range of 1-1-1900 to 1-1-2050, and does not include the holidays. If a column 'order date' belongs to the date domain it has to meet all the conditions of the date domain along with some more specific conditions (i.e., cannot be more than the current date).
Typical Construct of a domain is:
- Data Type—Basic data types are integer, decimal, OR character. Most data bases support variants of these plus special data types for date and time.
- Length—This is the number of digits OR characters in the value. For example, a value of 5 digits OR 40 characters.
- Date Format—The format for date values such as dd/mm/yy OR yy/mm/dd.
- Range—The range specifies the lower and upper boundaries of the values the attribute may legally have.
- Constraints—Are special restrictions on allowable values. For example, the Beginning_Pay_Date for a new employee must always be the first work-day of the month of hire.
- Null support—Indicates whether the attribute can have null values
- Default value (if any)—The value an attribute instance will have, if a value is not entered.
Domains are useful in terms of defining the global sets, to which a field can belong to. You can define the domains having the value ranges, specific list of permissible items, OR the values, which need to be excluded. If managed well, domains can drive more than 50% of the column level data integrity and quality.
Apart from a column fulfilling the domain definition, there is next level of quality assurance, which deals with Domain- mapping. This means, that there is a mapping between two OR more domain given a formula. For example the city code (one set of domain) with city name (other set of domain).
Data Standards Controls for Data Quality Assurance
This is a big subject in it-self. An organization typically has large assortment of systems and it interacts with internal and external stakeholders. Here we are focusing on the Data Standards from business applications perspective. There are other data standards used for information exchange across multiple platforms. Data standards define the standard way in which a business object will be used in business application. The data standards for a business object include:
- The fields and their format: For example 'Address' field will always have Address line 1, line 2 of 60 character each, city of 30 characters, state of 30 characters, followed by country of 40 characters, and PIN code of 12 characters.
- The character pattern and coding definition (Customer codes, supplier code, transaction code, location names, registration numbers).
- The value properties. (Value range, enumerated values)
Not having data standards could result in (say) addresses stored in different formats and style. This can pose a problem for integration. There are many examples where core systems and distribution management, CRM systems have different data design for agents and even different coding system for the same set of agents.
Data standards are having some overlap with domain. An 'address' domain will have all the possible addresses as well as a Data Standard for how the address field should be formatted.
TIP- For Data Domain and Data Model standards, it is an accomplishment for an organizations to create it and implement it for new applications. For the existing data and applications, one needs to build a funded road-map for conversion to new standards. One is not advised to take a big-bang conversion job.
Title- Data Domain Value, Standards Quality Assurance
Small Description- One method for Data Quality Assurance is through defining Domain and Data Standards. These standards include Value range, character patters and data formats.
Keywords- Data, Data Standards, Data Quality Assurance, Data Domain