Duplicate Identification
Capability/Duplicate Identification
Jump to navigation
Jump to search
Name Name of a concept, e.g. a data model concept. In contrast to terms, the name does not depend on a given context, e.g. a country-specific language. | Duplicate Identification (category: , sort rank: 110, product: Data Quality Assessment & Benchmarking) |
---|---|
Short description Informal and short human-readable definition of a concept. | Duplicate matching and consolidation with detailed and feature-rich configurations. |
Description Informal and comprehensive human-readable definition of a concept. | Duplicate matching compares all records of a given set of custom databases to each other, identifies similar records, and groups "best matches" in matching groups. The process to get a duplicate report comprises three steps: (1) select custom databases to be analyzed and a matching configuration to configure the matching algorithm, (2) start a matching job and wait for the result (i.e. record links with similarity score), and (3) generate a duplicate report with (optionally) cleansed golden records for each matching group.
The general duplicate matching process can be divided into three major steps: ![]() Attribute Selection
Harmonization
Search & Compare
|
Release status The release status in terms of development progress or maturity of a product feature or a business capability.<br/><code>EMPTY</code> (0): No feature considered yet, just rough idea for capability.<br/><code>IDEA</code> (1): Just an idea, not yet designed in detail.<br/><code>DESIGN</code> (2): Software design ready, development not yet started.<br/><code>DEVELOPMENT</code> (3): Software development in progress.<br/><code>ALPHA</code> (4): First functional release, in terms of a Minimal Viable Product (MVP).<br/><code>BETA</code> (5): Tested by selected users.<br/><code>RC</code> (6): Release candidate, fully tested, not yet used in production by many customers.<br/><code>LIVE</code> (7): Used in production by customers, fully monitored and supported.<br/><code>DEPRECATED</code> (-1): End of life planned, but still available.<br/><code>EOL</code> (-2): End of life, historic service, no longer available.<br/><code>BROKEN</code> (-3): Service was used in production but is currently not available. However, CDQ tries to repair or reactivate it. | LIVE
|
Use cases | Analyze and Prepare a Business Partner Storage for Get Clean, Cleanse and Enrich a Business Partner Storage, Duplicate avoidance in the business partner data maintenance workflow, Iterative Duplicate Check, Link and Consolidate Data from Multiple Systems, Simple Duplicate Check |
Apps |
Following apps provide this capability
|
APIs |
Following APIs provide this capability
|
Features
Feature | Short description Informal and short human-readable definition of a concept. | Release status The release status in terms of development progress or maturity of a product feature or a business capability.<br/><code>EMPTY</code> (0): No feature considered yet, just rough idea for capability.<br/><code>IDEA</code> (1): Just an idea, not yet designed in detail.<br/><code>DESIGN</code> (2): Software design ready, development not yet started.<br/><code>DEVELOPMENT</code> (3): Software development in progress.<br/><code>ALPHA</code> (4): First functional release, in terms of a Minimal Viable Product (MVP).<br/><code>BETA</code> (5): Tested by selected users.<br/><code>RC</code> (6): Release candidate, fully tested, not yet used in production by many customers.<br/><code>LIVE</code> (7): Used in production by customers, fully monitored and supported.<br/><code>DEPRECATED</code> (-1): End of life planned, but still available.<br/><code>EOL</code> (-2): End of life, historic service, no longer available.<br/><code>BROKEN</code> (-3): Service was used in production but is currently not available. However, CDQ tries to repair or reactivate it. |
---|---|---|
Data Mirror Lookup | Lookup business partner data in a data mirror for e.g. identifying potential duplicates during new record creation | LIVE
|
Duplicate Consolidation | Consolidate a group of identified duplicate records into one surviving record based on a Duplicate Consolidation Configuration | LIVE
|
Duplicate Detection | Identify potential duplicates in a given set of business partner data records. | LIVE
|
Duplicate Matching Configuration | Define how business partner records from a data set are compared by the duplicate anaylsis algorithm. | LIVE
|
Matching Cleaner Configuration | Cleaners transform or normalize data before it is effectively compared by the duplicate analysis algorithm. | LIVE
|
Matching Comparator Configuration | Comparators compare data from the the same attribute but different records and produce a match score. | LIVE
|
Matching and Consolidation Reports | Duplicate and record linkage reporting | LIVE
|
Record Linkage | Identify identical records in two or more datasets | LIVE
|
Duplicate Consolidation Configuration | Defines how records in a duplicate matching group are consolidated into a "best guess" or "golden" record. | BETA
|
Match Candidate Review | Manual review of matches proposed by duplicate detection or record linkage and consideration of the reviews in subsequent matching runs | BETA
|
Why is efficient matching of business partner data challenging
Matching challenges | |
---|---|
Typical challenges when matching business partner names |
|
Typical challenges when matching business partner addresses |
|