GDPR compliance

From CDQ Wiki
Public:GDPR compliance
Jump to navigation Jump to search

CDQ supports customers in maintaining GDPR compliance by identifying business partner records that may contain personal data. This is crucial for the data sharing approach, since shared insights must remain privacy-compliant. Only non-personal, business-related data can be exchanged among participants.

Our algorithms analyze patterns such as national registration numbers, legal forms that indicate sole proprietorships, and typical name structures to estimate the probability that a record refers to a natural person. A neural network trained on large sets of company and forename data enhances this detection logic. By ensuring that personal data is excluded from shared datasets, CDQ guarantees that collaboration remains secure, ethical, and fully compliant with European data protection laws.

Approach to identify personal information

The algorithm to identify personal data distinguishes between

  • contact information provided in a business partner's name such as "CDQ AG attn: Simon Schlosser"
  • Registered individuals, i.e. natural persons that are registered as e.g. sole proprietors in an official register (e.g. Simon Schlosser e.K.)
  • Individuals, i.e. natural persons where there is no evidence that the person is actually registered (e.g. freelancers as "Simon Schlosser")
Example Type Description
CDQ AG Legal Entity Typical example. There is no name information (person name) included and there is a legal form available.
CDQ AG z.Hd. Simon Schlosser Legal Entity with contact information
Simon Schlosser Individual Typical example. There is a  person name, no legal form and no VAT ID. The record is not to be stored in the CDL database.
Simon Schlosser e.K. Registered individual Individuals that are registered and have a legal form. There are different legal forms for natural persons in different countries.

Identification strategies

In order to identify personal information, different strategies are applied to a given record. The following strategies are available and are executed in the following order:

Name list check

CDQ manages lists of typical forenames for different countries and trains a neuronal network with this data to identify such terms in a given record. Feedback from the Data Sharing Community is also used as training input to improve matching results.

Legal form check

If legal form information is provided in a given record, our services try to identify and enrich the related legal form. Some legal forms in some countries indicate registered individuals or sole proprietors and thus provide evidence on personal information in terms of natural person names.

No legal forms maintained.

Identifier check

Derivation

Based on certain identifiers, it is possible to identify natural persons. For example in PT (Portugal), the first digit of the VAT number indicates whether the record represents an individual: 1-3 are regular people, 5 are companies.

Identifier schemaCountryIdentifierDescription
000000000057H0CS5Z46ACS04YThe first digit indicates whether the record represents an individual, i.e.: 1-3 are regular people, 5 are companies
for non-residents (only subject to final withholding at source) the ID starts with "45"
00000000007CGYKN2FKHXN1JB9For natural persons this number consists of 12 digits
0000000000BGWDBG0HD1V5FAWYSpecifies how to derive a sole proprietor from a DIC number. The DIC number may have different formats and thus different patterns apply for identifying individuals.
  • 10 digit case: If the number is represented by C1C2C3C4C5C6C7C8C9C10 then
    • C1 and C2 are representing the year of birth and must be in the range 00 - [current year] or 54-99
    • C3 and C4 are representing the month of birth:
      • More than 1 and less than 12 for men
      • or ore than 21 and less than 32 for men
      • or more than 51 and less than 62 for women
      • or more than 71 and less 82 for women.
    • C5 and C6 are representing the day of birth and must be 01 to 31
  • 9 digit case: If the number is represented by C1C2C3C4C5C6C7C8C9 then
    • C1 and C2 are representing the year of birth and must be in the range 00 to 53
    • C3 and C4 are representing the month of birth:
      • More than 1 and less than 12 for men
      • More than 51 and less than 62 for women
    • C5 and C6 are representing the day of birth and must be 01 to 31
0000000000DCRZ8Y1PB8PCXV0910 characters: 5 letters + 4 digits + 1 letter
4th character informs about the holder of the card: "P" - stands for Individuals ("Proprietor")
0000000000Q4F2WYGAGTRQMW7PFor Entities such as Company’s or Associations of Persons (AOP) the TIN is designated as the National Tax Number (NTN). For individuals the TIN / NTN assumes the following format: AAAAA-AAAAAAA-N (total of 13 digits), A identifies that it must be a alphanumeric digit, N identifies that it must be a numeric digit.
0000000000V72GY0NA3SG7YVPPSpecifies how to derive a sole proprietor from a EU VAT in Czech Republic. The EU VAT is identical with the DIC number (Czech Republic) and thus the identical patterns apply.

Personal identification numbers

Moreover, there are identifiers that are only assigned to natural persons such as the US_SEC_ID (US - Social security number) in the US (United States of America).

Personal Identification NumberCountryName
BR_CPF (BR - Natural Persons Register)BR (Brazil)Cadastro de Pessoas Físicas (pt)
Natural Persons Register (en)
ES_NIE (ES - Tax Identification Number)ES (Spain)Numero de Identificacion de Extranjero (es)
Tax Identification Number (en)
FO_FIN (FO - Faroese ID Num.)FO (Faroe Islands)Faroese Identification Number (en)
FO_FPN (FO - Faroese P Number)FO (Faroe Islands)Faroese P Number (en)
GL_CPR (GL - CPR number)GL (Greenland)CPR number (en)
JE_SSN (JE - Social Security Number)JE (Jersey)Social Security Number (en)
KE_PIN (KE - Personal ID)KE (Kenya)Personal Identification Number (en)
KR_RES_ID (KR - Resident ID)KR (South Korea)Resident Registration Number (en)
SM_SSI (SM - Social Security Number)SM (San Marino)Social Security Number (en)
UK_IN_ID (UK - NI number)GB (United Kingdom of Great Britain and Northern Ireland)NI number (en)
US_SEC_ID (US - Social security number)US (United States of America)Social security number (en)

Contact information parsing

In order to identify contact information, typical keywords such as attn:, z.Hd., attention to etc. are searched. This parsing is provided via the data quality rule Contact information misplaced.