This subject is dealt in fair degree of detail in Customer data quality. Some comments specific to customer segmentation are:
Re-Assess the class attributes in customer dimension on the regular basis-
This is one of the key reasons in break-down of customer relationships. Class attributes need to be revised regularly not only in terms of reviewing the class attributes of the customers, but also the logic of the class attributes themselves. For example, 'tenure at current job' class may have value 'long tenure' for any one having worked five years OR above., but this logic may undergo a change in a booming economy, where even 3 years and above may be termed as 'long term’
Missing OR Dirty Data
The data about the potential customers (like marketing databases, lists..), is generally dirty and incomplete. This creates a problem of accuracy. Point to note is that you are not looking for 100% accuracy. Solution is to undertake data cleansing and data augmentation .
Lack of Customer Data Management leading to incomplete customer data
Due to lack of customer data integration, the true details on customer relationship (like value, profitability, usage patterns, tenure..) are missed out. The solution to this is Business Intelligence OR Customer Data Integration.
Preparing the external data for analysis
The external data is received in various forms, shapes and level of details. This is not like standard ETL which you can do on your production data. The challenge of merging marketing databases received from multiple agencies is huge. One will be having considerable level of de-duplication and cleansing to be done. One possible solution is to help the supplier of these marketing databases to do the cleansing at their end. Secondly, you can also deploy some automated tools to search and correct the data. These tools are good for doing cleansing and correction, where you are not looking for 100% accurate results.
Analyzing the external data through BI tools
It’s relatively easy to do the data warehousing and also the analysis on the customer segmentation for the existing customers, as the data is in better health. However, for data acquired externally on the potential customers, one has to manage it in a way that it does not get mixed with the existing customer master data. The whole enrichment and augmentation along with loading should be maintained with a firewall insulation from the production master data.
For these, externally acquired databases, should one load only aggregates OR detailed data? ..One view of the level of detail is - Given the quality of externally acquired data, you may find it better to load the aggregates than the detailed data, as you are more keen to do high level analysis. At the same time, if you want to venture into data mining of the data, it may be worth putting the detailed data as well.