Natural Person Data Identification

From CDQ
Capability/natural person detection
Jump to navigation Jump to search
Name Name of a concept, e.g. a data model concept. In contrast to terms, the name does not depend on a given context, e.g. a country-specific language. Natural Person Data Identification
Short description Informal and short human-readable definition of a concept. Checks for personal data in a given business partner record
Description Informal and comprehensive human-readable definition of a concept. CDQ checks a given business partner data record for evidence of personal information, such as the name of a natural person, and provides the probability that the record comprises such information. As indicators for personal information, the algorithm looks for specific identifiers such as a Business Registration Number (Germany) in DE starting with "HRA", specific legal forms such as e.Kfm. (i.e. sole proprietors), and known natural person names. For the name matching in the last step, a neuronal network is used which is trained by large lists of known company names and country-specific forenames and by feedback from the Data Sharing Community.

Approach to identify personal information

The algorithm to identify personal data distinguishes between

  • contact information provided in a business partner's name such as "CDQ AG attn: Simon Schlosser"
  • Registered individuals, i.e. natural persons that are registered as e.g. sole proprietors in an official register (e.g. Simon Schlosser e.K.)
  • Individuals, i.e. natural persons where there is no evidence that the person is actually registered (e.g. freelancers as "Simon Schlosser")
Example Type Description
CDQ AG Legal Entity Typical example. There is no name information (person name) included and there is a legal form available.
CDQ AG z.Hd. Simon Schlosser Legal Entity with contact information
Simon Schlosser Individual Typical example. There is a  person name, no legal form and no VAT ID. The record is not to be stored in the CDL database.
Simon Schlosser e.K. Registered individual Individuals that are registered and have a legal form. There are different legal forms for natural persons in different countries.

Identification strategies

In order to identify personal information, different strategies are applied to a given record. The following strategies are available and are executed in the following order:

Name list check

CDQ manages lists of typical forenames for different countries and trains an artificial intelligence (i.e. a neuronal network) with this data to identify such terms in a given record. Feedback from the Data Sharing Community is also used as training input to improve matching results.

Legal form check

If Legal Form (CDQ.POOL) information is provided in the given record, legal form enrichment tries to identify and enrich the related legal form. Some legal forms in some countries indicate registered individuals or sole proprietors and thus provide evidence on personal information in terms of natural person names.

Identifier check

Derivation

Based on certain identifiers, it is possible to identify natural persons. For example in PT (Portugal, Portugal, Portugal, Portugal, 葡萄牙), the first digit of the VAT number indicates whether the record represents an individual: 1-3 are regular people, 5 are companies.

Personal identification numbers

Moreover, there are identifiers that are only assigned to natural persons such as the Social Security Number in the US (Vereinigte Staaten, United States of America, États-Unis d'Amérique, 美国).

Contact information parsing

In order to identify contact information, typical keywords such as attn:, z.Hd., attention to etc. are searched. This parsing is provided via the data quality rule Contact information misplaced in the course of Capability/Data Quality Checks.