Address Cleansing and Enrichment

From CDQ
Capability/Address Cleansing and Enrichment
Jump to navigation Jump to search
Name Name of a concept, e.g. a data model concept. In contrast to terms, the name does not depend on a given context, e.g. a country-specific language. Address Cleansing and Enrichment (category: , sort rank: 220, product: Data Cleansing and Enrichment)
Short description Informal and short human-readable definition of a concept. Address standardization, enrichment, correction, and translation on a global scale.
Description Informal and comprehensive human-readable definition of a concept. Address cleansing and enrichment covers standardization, enrichment, cleansing, translation, and geo-coding of addresses. This functionality is key to enable cross-corporate data management because it helps to deal with e.g. different languages, abbreviations, and writing rules of addresses.

Address Cleansing is following a chain of several phases with each phase transforming a given address record in a specific way. The following listing gives a brief overview of these curation phases:

  • Parsing: Identifies specific information in given address data (e.g. Post Code (CDQ.POOL), Premise (CDQ.POOL)) and puts this information in the appropriate attributes of the data model.
  • Cleansing: Compares given address data to reference data from e.g. GoogleMaps or Open Street Maps and replaces 'wrong' data (e.g. due to typos or missing accents) by reference data.
  • Enrichment: Adds missing information (e.g. Locality (CDQ.POOL), Post Code (CDQ.POOL) or Administrative Area (CDQ.POOL)) to given address records. Also geo-coding is performed in this phase.
  • Translation: Translates given address data to a given language. By default, all address data is translated to English.
  • Abbreviation: Adds abbreviations and codes (e.g. for Country (CDQ.POOL), Administrative Area (CDQ.POOL) Thoroughfare (CDQ.POOL)) to given address data. The cleansing phase provides full names for all attributes, so after this phase, all known fullnames and abbreviations are available.
  • Normalization: There are defined certain standards for address data, e.g. "only latin characters and no accents in base language address data". Such rules are applied in this phase.
Release status The release status in terms of development progress or maturity of a product feature or a business capability.<br/><code>EMPTY</code> (0): No feature considered yet, just rough idea for capability.<br/><code>IDEA</code> (1): Just an idea, not yet designed in detail.<br/><code>DESIGN</code> (2): Software design ready, development not yet started.<br/><code>DEVELOPMENT</code> (3): Software development in progress.<br/><code>ALPHA</code> (4): First functional release, in terms of a Minimal Viable Product (MVP).<br/><code>BETA</code> (5): Tested by selected users.<br/><code>RC</code> (6): Release candidate, fully tested, not yet used in production by many customers.<br/><code>LIVE</code> (7): Used in production by customers, fully monitored and supported.<br/><code>DEPRECATED</code> (-1): End of life planned, but still available.<br/><code>EOL</code> (-2): End of life, historic service, no longer available.<br/><code>BROKEN</code> (-3): Service was used in production but is currently not available. However, CDQ tries to repair or reactivate it. LIVE
Use cases Address Cleansing, Analyze and Prepare a Business Partner Storage for Get Clean, Business Partner Data Maintenance Workflow, Business Partner Data Maintenance Workflow (Lean Update), Cleanse and Enrich a Business Partner Storage
Apps
Following apps provide this capability
Batch Business Partner Curation
  • Batch Business Partner Curation Video
Batch Address Curation
  • Batch Address Curation Video
APIs
Following APIs provide this capability

Features

Feature Short description Informal and short human-readable definition of a concept. Release status The release status in terms of development progress or maturity of a product feature or a business capability.<br/><code>EMPTY</code> (0): No feature considered yet, just rough idea for capability.<br/><code>IDEA</code> (1): Just an idea, not yet designed in detail.<br/><code>DESIGN</code> (2): Software design ready, development not yet started.<br/><code>DEVELOPMENT</code> (3): Software development in progress.<br/><code>ALPHA</code> (4): First functional release, in terms of a Minimal Viable Product (MVP).<br/><code>BETA</code> (5): Tested by selected users.<br/><code>RC</code> (6): Release candidate, fully tested, not yet used in production by many customers.<br/><code>LIVE</code> (7): Used in production by customers, fully monitored and supported.<br/><code>DEPRECATED</code> (-1): End of life planned, but still available.<br/><code>EOL</code> (-2): End of life, historic service, no longer available.<br/><code>BROKEN</code> (-3): Service was used in production but is currently not available. However, CDQ tries to repair or reactivate it.
Address Cleansing and Enrichment Reports A result of the address data curation is the report containing all address cleansing activities. LIVE
Address Component Extraction Extract from a loosely structured address (e.g. address lines and no PO Box or Street distinguished) or semantically wrong maintained address (instead of a street a postcode and city are maintained) the identifiable address components and return them well structured. LIVE
Address Translation and Transliteration Transform your addresses into different languages, e.g. all English, auto-detected local language, or specifically selected target languages. LIVE
Geo Coordinate Enrichment (Geocoding) Enables users to geocode an input address, i.e. identifies the geocoordinates and returns them in terms of latitude and longitude. LIVE
Postal Code Enrichment and Harmonization This feature enriches postal codes if they are missing in the input data; Including different additional postal codes such as CEDEX in France if such a postal code exists. LIVE
Reference Address Identification and Correction Find reference addresses based on the given input data in connected data sources, for then correcting wrong input data (e.g. wrong city name for the given postal code) and enriching missing address components. LIVE
Address Cleansing and Enrichment Configuration Configure the address curation engine by defining the data sources to be used, selecting profiles, specifying standardization rules and defining output languages and charsets BETA
Administrative Area Enrichment and Harmonization Enrich and harmonize region information (region code and region name). Already given region data in an input address is standardized according to the ISO standard. BETA
Detailed Address Cleansing and Enrichment Result Access to raw results, comprehensive change documentation including data provenances and result quality indicators BETA
Locality Enrichment and Harmonization Enrich and standardize city and town names BETA
PO Box Enrichment and Harmonization Identifies that certain given information represents PO Box data, parses and extracts this information, and standardizes the data in terms of extracting the actual PO Box number and harmonizing the term or abbreviation that is used in a corresponding country for indicating a PO Box number (e.g. Post office box 27899 and Postfach 27899) BETA
Premise Enrichment and Harmonization Enrich and harmonize premise information (buildings, suites, apartments, rooms, floor, etc.). BETA
Public Address Data Sources Official postal address data which can be enabled for the identification of reference addresses BETA
Restricted Address Data Sources Official proprietary postal address data which can be enabled for the identification of reference addresses BETA
SAP Formatted Address Returns the processed address in a standard SAP format referencing the ADRC table which is used for vendor and customer master data. BETA
Service Quality Indicators CDQ cloud services create many results based on fuzzy matching, different rule sets and machine learning techniques. So the reliability and confidence in the correctness of enrichments are highly dependent on the machine's decisions that are not necessarily always accurate. Different kind of quality indicators provides support in deciding and judging a result's correctness. BETA
Thoroughfare Enrichment and Harmonization Identifies that certain given information represents a thoroughfare such as a street, parses and extracts this information for subsequently standardizing and harmonizing the data in terms of an abbreviated thoroughfare name, a long name, house number, street identifiers, and a provided direction. BETA
Address Type Identification Allows e.g. for finding out if a given address is a registered address. The feature identifies the type of an address (e.g. registered address) by searching in reference data and identifies the differences between the input address and the reference address. ALPHA
Address Function Identification Allows for identifying the operational use of an address such as bill to-, sold to-, delivery addresses IDEA
Business Point of Interest Identification Extract, enrich and harmonize business points of interest such as free trade zones, sea and airports in input data and enrich e.g. additional information such as airport codes IDEA
Company Postal Code Enrichment and Harmonization Enrich postal codes of large mail recipients IDEA