Data Warehouse Testing is Different  

DW Testing and Implementation

Like some other aspects, Data warehouse testing is fairly different from a transaction processing system. The amount of data and possible test scenarios can run into huge sets. The trick is to find a right balance.

Data Warehouse Testing is Different

All works in Data Warehouse population are mostly through batch runs. Therefore the testing is different from what is done in transaction systems.

Unlike a typical transaction system, data warehouse testing is different on the following counts:

User-Triggered vs. System triggered

Most of the production/Source system testing is the processing of individual transactions, which are driven by some input from the users (Application Form, Servicing Request.). There are very few test cycles, which cover the system-triggered scenarios (Like billing, Valuation.)

In data Warehouse, most of the testing is system triggered as per the scripts for ETL ('Extraction, Transformation and Loading'), the view refresh scripts etc.

Therefore typically Data-Warehouse testing is divided into two parts--> 'Back-end' testing where the source systems data is compared to the end-result data in Loaded area, and 'Front-end' testing where the user checks the data by comparing their MIS with the data displayed by the end-user tools like OLAP.

Batch vs. online gratification

This is something, which makes it a challenge to retain users interest.

A transaction system will provide instant OR at least overnight gratification to the users, when they enter a transaction, which either is processed online OR maximum via overnight batch. In the case of data- warehouse, most of the action is happening in the back-end and users have to trace the individual transactions to the MIS and views produced by the OLAP tools. This is the same challenge, when you ask users to test the month-end mammoth reports/financial statements churned out by the transaction systems.

Volume of Test Data

The test data in a transaction system is a very small sample of the overall production data. Typically to keep the matters simple, we include as many test cases as are needed to comprehensively include all possible test scenarios, in a limited set of test data..

Data Warehouse has typically large test data as one does try to fill-up maximum possible combination and permutations of dimensions and facts.

For example, if you are testing the location dimension, you would like the location-wise sales revenue report to have some revenue figures for most of the 100 cities and the 44 states. This would mean that you have to have thousands of sales transaction data at sales office level (assuming that sales office is lowest level of granularity for location dimension).

Possible scenarios/ Test Cases

If a transaction system has hundred (say) different scenarios, the valid and possible combination of those scenarios will not be unlimited. However, in case of Data Warehouse, the permutations and combinations one can possibly test is virtually unlimited due to the core objective of Data Warehouse is to allow all possible views of Data. In other words, 'You can never fully test a data Warehouse'

Therefore one has to be creative in designing the test scenarios to gain a high level of confidence.

Test Data Preparation

This is linked to the point of possible test scenarios and volume of data. Given that a data- warehouse needs lots of both, the effort required to prepare the same is much more.

Programming for testing challenge

In case of transaction systems, users/business analysts typically test the output of the system. However, in case of data warehouse, as most of the action is happening at the back-end, most of the 'Data Warehouse data Quality testing' and 'Extraction, Transformation and Loading' testing is done by running separate stand-alone scripts. These scripts compare pre-Transformation to post Transformation (say) comparison of aggregates and throw out the pilferages. Users roles come in play, when their help is needed to analyze the same (if designers OR business analysts are not able to figure it out).