Content area
Full text
Data quality tools have been around for some time, but the requirements of master data management (MDM) and product information management (PIM) are forcing a new wave of innovation and technology breakthroughs. The need to share information across the enterprise and supply chains is driving data from legacy application silos to be increasingly exposed and shared. This reveals massive inconsistencies and incompatibilities - hence, the recent interest in MDM and related technologies that promise to unify and synchronize this disparate data and deliver a single version of the truth.
Before a single version of the truth can be maintained, it must first be created, and that is more easily said than done. Loading an MDM system with inconsistent, as-is data risks a rude lesson in "garbage in, garbage out" and possibly the failure of the entire initiative. Disparate data must first be understood, then cleansed, enriched, standardized and restructured. Without this, records can't be reliably loaded, compared and matched across the enterprise.
For customer data integration (CDI), this task is relatively straightforward. CDI is the customer-focused subset of MDM and has benefited from the first wave of data quality technology, which has proven to be very effective at solving name and address problems. These traditional tools use algorithms and heuristics to correct keyboard entry errors, phonetic misspellings, alternate name forms (Robert and Bob), invalid ZIP codes, household variations and similar challenges. Such technologies are syntax and pattern based. They are quite mature but still remain primarily focused on name and address issues, despite repeated attempts to broaden their scope.
For PIM, the challenges are much tougher. This product-focused branch of MDM has demonstrated that syntax-based data quality tools perform poorly when faced with the overwhelming complexity and variability of most product data. Match rates seldom exceed 50 percent. So, does product data need to be...





