Natural Person Data Identification
Name Name of a concept, e.g. a data model concept. In contrast to terms, the name does not depend on a given context, e.g. a country-specific language. | Natural Person Data Identification |
---|---|
Short description Informal and short human-readable definition of a concept. | Checks for personal data in a given business partner record |
Description Informal and comprehensive human-readable definition of a concept. | CDQ checks a given business partner data record for evidence of personal information, such as the name of a natural person, and provides the probability that the record comprises such information. As indicators for personal information, the algorithm looks for specific identifiers such as a Business Registration Number (Germany) in DE starting with "HRA", specific legal forms such as e.Kfm. (i.e. sole proprietors), and known natural person names. For the name matching in the last step, a neuronal network is used which is trained by large lists of known company names and country-specific forenames and by feedback from the Data Sharing Community. |
Approach to identify personal information
The algorithm to identify personal data distinguishes between
- contact information provided in a business partner's name such as "CDQ AG attn: Simon Schlosser"
- Registered individuals, i.e. natural persons that are registered as e.g. sole proprietors in an official register (e.g. Simon Schlosser e.K.)
- Individuals, i.e. natural persons where there is no evidence that the person is actually registered (e.g. freelancers as "Simon Schlosser")
Example | Type | Description |
---|---|---|
CDQ AG | Legal Entity | Typical example. There is no name information (person name) included and there is a legal form available. |
CDQ AG z.Hd. Simon Schlosser | Legal Entity with contact information | |
Simon Schlosser | Individual | Typical example. There is a person name, no legal form and no VAT ID. The record is not to be stored in the CDL database. |
Simon Schlosser e.K. | Registered individual | Individuals that are registered and have a legal form. There are different legal forms for natural persons in different countries. |
Identification strategies
In order to identify personal information, different strategies are applied to a given record. The following strategies are available and are executed in the following order:
Name list check
CDQ manages lists of typical forenames for different countries and trains an artificial intelligence (i.e. a neuronal network) with this data to identify such terms in a given record. Feedback from the Data Sharing Community is also used as training input to improve matching results.
Legal form check
If Legal Form (CDQ.POOL) information is provided in the given record, legal form enrichment tries to identify and enrich the related legal form. Some legal forms in some countries indicate registered individuals or sole proprietors and thus provide evidence on personal information in terms of natural person names.
Identifier check
Derivation
Based on certain identifiers, it is possible to identify natural persons. For example in PT (Portugal, Portugal, Portugal, Portugal, 葡萄牙), the first digit of the VAT number indicates whether the record represents an individual: 1-3 are regular people, 5 are companies.
Personal identification numbers
Moreover, there are identifiers that are only assigned to natural persons such as the Social Security Number in the US (Vereinigte Staaten, United States of America, États-Unis d'Amérique, 美国).
Contact information parsing
In order to identify contact information, typical keywords such as attn:
, z.Hd.
, attention to
etc. are searched. This parsing is provided via the data quality rule Contact information misplaced in the course of Capability/Data Quality Checks.