Release note/20201228
Release date Date and time when a concept, e.g. a data quality rule, is released, i.e. when the status changes to <code>RELEASED</code>. | 28-12-2020 |
---|---|
Feature Affected feature(s) | Identifier Checks |
Idea portal Ideas that have been introduced with the release | no idea entry directly linked with this release note |
Extended tax and business identifier checks
The data quality checks have been extended by additional 49 data quality rules. This allows for checking a much broader scope of countries that were not considered in earlier releases. In detail, mainly additional rules for checking identifier formats and identifier checkdigits are now available. See more details in the full release note.
New data quality rules released
Description | Criticality | Country | |
---|---|---|---|
Identifier checkdigit invalid (GST number (India)) | This rule checks the check digit of the Goods and Services Tax number (GST) in India. The first fourteen digits of the GST are the GST number (India).
calculate a hash of each digit of GST number. Steps for calculation of hash are mentioned below: a) Obtain corresponding code [C] for each alphanumeric digit of GST number. The codes for each alphanumeric characters are as following:
b) Multiply “code” with a number called Multiplier [M], which will be 1 for odd digits and 2 or even digits of GST Number. c) So on multiplication of “code” with “multiplier” [ C x M ] we get a number called Product [P]. d) Divide “product” with 36 [ P ÷ 36]. This division yields us a Quotient [Q] & a Remainder ['R]. e) Add “quotient” and “remainder” [ Q + R ] to get Hash of that digit of GST number. Repeat the above steps for each digit of GST number, and obtain corresponding hash. After that, derive the checksum digit with below-mentioned steps: For these 14 digits the check digit is calculated, in the following way:
| ERROR | IN (Indien, India, Inde, Bhārat, 印度) |
Identifier checkdigit invalid (NIP number (Poland)) | This rule describes the algorithm for checksum digit (the last 10th digit) for NIP number (Poland) using the following logic:
1) multiply the fist 9 digits of the NIP by the corresponding wages: 6, 5, 7, 2, 3, 4, 5, 6, 7; 2) sum up the result of the multiplication from step one; 3) calculate modulo 11. | ERROR | PL (Polen, Poland, Pologne, Polska, 波兰) |
Identifier checkdigit invalid (NIT number (Colombia)) | This rule checks the checkdigit of NIT number (Colombia). | ERROR | CO (Kolumbien, Colombia, Colombia, Colombie, 哥伦比亚) |
Identifier checkdigit invalid (REGON (Poland)) | This rule describes the algorithm for checksum digit (the last digit) for REGON identifier. Logic for 9-digits identifier:
1) multiply each of the first 8 digits by the gidit-wages: 8 9 2 3 4 5 6 7 ; 2) sum up the numbers received in step 1; 3) apply modulo 11. the rest is equal to the schecksum digit. For the 14-digits REGON identifier the logic is the same, but the wages for the first 13 digits are different: 2 4 8 5 0 9 7 3 6 1 2 4 8. | ERROR | PL (Polen, Poland, Pologne, Polska, 波兰) |
Identifier checkdigit invalid (RUC number (Paraguay)) | This rule checks the checkdigit of RUC number (Paraguay). | ERROR | PY (Paraguay, Paraguay, Paraguay, Paraguay, Paraguay, 巴拉圭) |
Identifier checkdigit invalid (RUC number (Peru)) | This rule checks the checkdigit of RUC number in Peru
[C1,C2, C3, C4, C5, C6, C7, C8, C9, C10] - Every single digit in 10 digit input number has its corresponding weight: [5, 4, 3, 2, 7, 6, 5, 4, 3, 2] 1. Multiply every element from input number by corresponding weight: - C1 * 5 - C2 * 4 - C3 * 3 - C4 * 2 - C5 * 7 - C6 * 6 - C7 * 5 - C8 * 4 - C9 * 3 - C10 * 2 2. Sum up all of results (sumproduct) 3. The reminder of the modulo 11 4. Subtract restults from 11 5. Checkdigit = the reminder of modulo 10 | ERROR | PE (Perú, Peru, Peru, Perú, Pérou, Perú, 秘鲁) |
Identifier checkdigit invalid (RUT number (Chile)) | This rule checks the checkdigit of RUT number (Chile). | ERROR | CL (Chile, Chile, Chile, Chili, 智利) |
Identifier checkdigit invalid (RUT number (Uruguay)) | This rule checks the checkdigit of RUT number (Uruguay). | ERROR | UY (Uruguay, Uruguay, Uruguay, Uruguay, 乌拉圭) |
Identifier checkdigit invalid (Tax Identification Number (Serbia)) | 9 digits (ex. 123456788) of which the first 8 are the actual ID number, and the last digit is a checksum digit, calculated according to ISO 7064, MOD 11-10 | ERROR | RS (Serbien, Serbia, Serbie, Srbija, 塞尔维亚) |
Identifier checkdigit invalid (Tax Registration Number (Belarus)) | This rule checks the checkdigit of Tax Registration Number (Belarus)
The number consists of 9 digits (numeric for organisations, alphanumeric for individuals). [C1,C2, C3, C4, C5, C6, C7, C8, CheckDigit] - Every single digit in 8 digit input number has its corresponding weight: weights = (29, 23, 19, 17, 13, 7, 5, 3) 1. Multiply every element from input number by corresponding weight: - C1 * 29 - C2 * 23 - C3 * 19 - C4 * 17 - C5 * 13 - C6 * 7 - C7 * 5 - C8 * 3 2. Sum up all of results (sumproduct) 3. Checdigit - The reminder of the modulo 11 | ERROR | BY (Bielaruś, Weißrussland, Byelorussian SSR, Bélarus, Belarus', 白俄罗斯) |
Identifier checkdigit invalid (Tax identification number (South Africa)) | This rule describes the algorithm for checksum digit (the last digit) for Tax identification number (South Africa). Logic for 10-digits identifier:
The last character is a check digit, calculated by applying the following algorithm:
Digit 10: Check digit
| ERROR | ZA (Suid-Afrika, Südafrika, South Africa, Afrique du Sud, Ningizimu Afrika, Afrika-Borwa, Afrika-Borwa, Afrika-Dzonga, Afrika Tshipembe, Mzantsi Afrika, 南非, Ningizimu Afrika) |
Identifier checkdigit invalid (Tax number (Romania)) | This rule checks the checksum of the Tax number in Romania.
To calculate the checksum digit, every digit from CNP is multiplied with the corresponding digit in number 279146358279; the sum of all these multiplications is then divided by 11. If the remainder is 10 then the checksum digit is 1, otherwise it's the remainder itself. Any difference result in a rule violation. | ERROR | RO (Rumänien, Romania, Roumanie, România, 羅馬尼亞) |
Identifier checkdigit invalid (VAT registration number (Switzerland)) | The last digit is a MOD11 checksum digit build with weighting pattern: 5,4,3,2,7,6,5,4. The rule allows for any valid formatted CH VAT ID, in case of an invalid format (see Identifier format invalid (VAT registration number (Switzerland))) the rule will fail directly. | ERROR | CH (Schweiz, Switzerland, Suisse, Svizzera, Svizra, 瑞士) |
Identifier format invalid (Business Identification Number (Kazakhstan)) | The BIN contains 12 figures that are divided into five blocks. It carries the information about the company ownership type, registration date, serial number, and other information.
- The first part consists of 4 digits and includes the year (two last digits) and the month of state and record registration or reregistration of the legal entity, branches, representative offices. - The second part consists of one digit and means the type of the legal entity: Specific values of the type of legal entity or individual entrepreneur: * 4 - for resident legal entities; * 5 - for non-resident legal entities; * 6 - for IP; - The third part consists of one digit and is an additional feature and is determined as follows: * 0 - head unit of a legal entity or individual entrepreneur; * 1 - branch of a legal entity or individual entrepreneur; * 2 - representation of a legal entity or individual entrepreneur; * 3 - farm operating on the basis of joint entrepreneurship; - The fourth part consists of 5 digits and includes the serial number of registration in the system of a legal entity (branches and representative offices) or individual entrepreneur; - Last digit is a check digit | ERROR | KZ (Kasachstan, Kazakhstan, Kazakhstan, Qazaqstan, Kazahstan, 哈萨克斯坦) |
Identifier format invalid (Business Registration Number (New Zealand)) | The New Zealand Business Number (NZBN) is a unique 13 digit identifier for all New Zealand businesses, including companies, sole traders, partnerships, registered charities, trusts and government agencies. The NZBN primarily identifies the business or entity. When used for New Zealand limited liability companies the first two digits are 94, identifying the company as a New Zealand entity, the next ten digits are the business entity id and the last digit is a system check. | ERROR | NZ (Neuseeland, New Zealand, Nouvelle-Zélande, Aotearoa, 新西兰) |
Identifier format invalid (CURP number (Mexico)) | CURP number (Mexico) is an individual registration number. It is a unique alphanumeric 18-character string. This rule checks presence of exactly 18 characters corresponding to the format of the CURP number (Mexico) without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more digits, unexpected non-digits characters then the rule is violated. | ERROR | MX (Mexiko, Mexico, México, Mexique, 墨西哥) |
Identifier format invalid (Company Registration Number (Hong Kong)) | The Company Registration Number (CRN) can be compared to a company’s Social Security Number. It is used as an official means of representing a company in legal documents and government records. The CRN is a 7-digit number. | ERROR | HK (Hong Kong, Hong Kong, Xianggang) |
Identifier format invalid (Company identification number (Switzerland)) | Company identification number (Switzerland) consists of prefix "CHE" followed by 9 digits and "HR" or "RC" (optional). The structure of a UID-number can be modelled as follows: CHE-999.999.999 HR. Depending on whether the identification number is additionally used as VAT number the suffix allows also for the additional abbreviations: MWST, TVA or IVA in any combination with HR and RC, e.g. HR/MWST, RC/TVA or RC/IVA This rule checks presence of exactly 9 digits and prefix without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more or wrong placed digits/letters or the prefix is missing then the rule is violated. | ERROR | CH (Schweiz, Switzerland, Suisse, Svizzera, Svizra, 瑞士) |
Identifier format invalid (D-U-N-S) | The D-U-N-S consists of exact 9 numerical digits. This rule checks the existence of exact 9 digits without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more digits or non-numeric digits the rule is violated. | ERROR | WORLD (World) |
Identifier format invalid (Employer identification number (United States)) | The Employer identification number consists of exactly 9 digits. Prior to 2001, the first two digits of an EIN (the EIN Prefix) indicated the business was located in a particular geographic area. The prefixes as provided here [1] are allowed. | ERROR | US (Vereinigte Staaten, United States of America, États-Unis d'Amérique, 美国) |
Identifier format invalid (Enterprise number (Belgium)) | The Enterprise Number (or Association Number, National Number, Company Number or Unique Establishment Number) is a unique code given by the Belgian government. This rule checks presence of exactly 10 digits and prefix without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more digits, unexpected non-digits characters then the rule is violated. | INFO | BE (Belgien, Belgium, Belgique, België, 比利时) |
Identifier format invalid (European value added tax identifier (Belgium)) | The European value added tax identifier in Belgium consists of exact 10 numerical digits prefixed by "BE". The first digit following the prefix is always 0 or 1. This rule checks the existence of exact 10 digits without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more, non-numeric digits or when the first digit does not equal 0 or 1 the rule is violated. | ERROR | BE (Belgien, Belgium, Belgique, België, 比利时) |
Identifier format invalid (European value added tax identifier (Cyprus)) | The European value added tax identifier in Cyprus consists of 9 characters (8 numerical digits + 1 letter) prefixed by "CY". This rule checks the existence of 8 numerical digits followed by a character without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more or wrong placed digits or the last cipher is not a letter but a numerical digit then the rule is violated. | ERROR | CY (Republik Zypern, Kýpros, Cyprus, Chypre, Kıbrıs, 賽普勒斯) |
Identifier format invalid (GST Number (New Zealand)) | Registering for GST is optional for businesses earning less than $60,000 annually. The format is fixed to 9 characters, that all being numbers. Format: 123-456-789. The hyphens are treated as optional by this rule. This format was introduced in 2008. Before this the number was an eight-digit number. Already existing old numbers remained unchanged. That's why the rule additionally allows for 8 digits only | ERROR | NZ (Neuseeland, New Zealand, Nouvelle-Zélande, Aotearoa, 新西兰) |
Identifier format invalid (ICO number (Czech Republic)) | The ICO number is identification number (business register code) of legal entities, as well as for natural persons doing business. ICO is an 8-digit number without any letters or special charcters. In case the number has less than 8 digits, it should be completed by zeros placed at the beginning. This rule checks presence of exactly 8 digits without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more digits, unexpected non-digits characters then the rule is violated. | ERROR | CZ (Česko, Tschechien, Czechia, Tchéquie, 捷克) |
Identifier format invalid (Individual Identification Number (Kazakhstan)) | Individual identification number (IIN) is a unique combination of 12 digits generated for an individual at the initial registration in the information-production system for the production of documents.
Formation of IIN occurs automatically taking into account the principles of uniqueness and immutability. In order to preserve the integrity of data of information banks of various levels using the IIN in the data structure, it is not subject to any modification or regeneration from the time of initial formation. IIN is located on the front of the identity card of a citizen of the Republic of Kazakhstan, below the date of birth in the form of a combination of 12 figures, in the passport of a citizen of the Republic of Kazakhstan IIN is listed on page 2. In accordance with Article 562 of the Tax Code of the Republic of Kazakhstan, foreigner must receive an IIN in the following cases: - Becomes a Kazakhstan tax resident (within 30 calendar days from the date of entry); - Receiving income from sources in the Republic of Kazakhstan, which is not subject to taxation at the source of payment; - Opening a current account in Kazakhstan resident banks; - Acquisition of property in Kazakhstan, which is subject to property tax, vehicle tax or land tax; - Appointment of the first heads of legal entities of the Republic of Kazakhstan, as well as heads of branches and representative offices of foreign legal entities. | ERROR | KZ (Kasachstan, Kazakhstan, Kazakhstan, Qazaqstan, Kazahstan, 哈萨克斯坦) |
Identifier format invalid (RFC number (Mexico)) | The Mexican tax identification number consists of 3 letters followed by a delimiting hyphen("-") + a 6-digit number + "-" followed by a 3-character string. The rule does check whether exactly 3 letters are followed by 6 numerical digits and 3 characters without considering the delimiting hyphen nor whitespaces or other delimiters such as dots.
RFC stands for Registro Federal de Contribuyentes, and the clave RFC (RFC number) is a Mexican tax identification number. It’s issued by the Mexican Tax Administration Service (Servicio de Administración Tributaria). The structure of the RFC varies depending on the type of taxpayer.
and 3 alphanumeric characters).
- If the company name is three words or greater, the three-letter string will comprise the first letter of each word. For example, if the company’s name is El Gato Azul Restaurante, S.A. de C.V., the RFC would start with GAR. - If the company name is two words, the three-letter string will comprise the first letter of the first word and the first two letters of the second word. For example, if the company’s name is Gato Azul, S.A. DE C.V., the RFC would start with GAZ. - If the company name is one word, the three-letter string will comprise the first three letters of the word. For example, if the company’s name is Gato, S.A. de C.V., the RFC would start with GAT.
RFC for individuals is structured as follows:
- First letter and first internal vowel of the paternal surname: If the paternal surname does not have a first internal vowel, the second letter of the paternal surname is used, and If the person does not have a second surname, first two letters of the paternal surname are used. - First letter of the maternal surname: If the person does not have a second surname, the first letter of the given name is used. - First letter of the given name: If the person does not have a second surname, the second letter of the given name is used.
| ERROR | MX (Mexiko, Mexico, México, Mexique, 墨西哥) |
Identifier format invalid (RUC number (Ecuador)) | The RUC is a tax identification number for legal entities. It has 13 digits
where the third digit is a number denoting the type of entity. RUC is the tax identification number assigned to every physical person and company subject to tax liabilities. The RUC number has 13 digits; it has no letters nor special characters. For individuals:
For Foreign companies and Foreign non-resident individuals:
* 10th digit -check digit.
For public entities:
* 9th digit - check digit.
| ERROR | EC (Ecuador, Ecuador, Ecuador, Équateur, 厄瓜多尔) |
Identifier format invalid (RUC number (Paraguay)) | The RUC (Registro Único del Contribuyente) is the unique taxpayer registry that maintains the personal non-transferable identification number for all those physical persons (national or foreign) and legal entities (for profit and non for profit) that carry out economic activities in the Paraguayan territory. The RUC number for legal entities consists of 8 digits starting after 80000000. Number for residents and foreigners are up to 9 digits. The last digit is a check digit, that is calculated with special formula. | ERROR | PY (Paraguay, Paraguay, Paraguay, Paraguay, Paraguay, 巴拉圭) |
Identifier format invalid (RUT number (Uruguay)) | The RUT number is a number provided by the Tax Administration and is used in the case of legal
entities and certain physical persons carrying out business activities or who have been assigned a RUT number (example: commercial activity, services, net worth) This number is formed by 12 digits, the first two indicate the registration number, the following six sequential digits indicate the number and the last 4 are always "001x" where x is the verifying digit. Where: NNNNNNNNNNN: is formed by eleven digits containing a unique number. D: is formed by one digit containing the verification digit for the full number. | ERROR | UY (Uruguay, Uruguay, Uruguay, Uruguay, 乌拉圭) |
Identifier format invalid (Tax Identification Number (Indonesia)) | The Tax Identification Number (TIN) is known in Indonesia as Nomor Pokok Wajib Pajak (NPWP).
requirement. Since July 2022 the format changed to 16 numeric digits. The old format was as follows: The number consisted of 15 digits of which:
| ERROR | ID (Indonesien, Indonesia, Indonésie, Indonesia, 印度尼西亚) |
Identifier format invalid (Unified Business Identifier Number (United States - Washington)) | A UBI number is a 9-digit number that registers you with several state agencies and allows doing business in Washington State. This rule checks presence of exactly 9 digits and prefix without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more digits, unexpected non-digits characters then the rule is violated. | INFO | US (Vereinigte Staaten, United States of America, États-Unis d'Amérique, 美国) |
Identifier format invalid (VAT Number (Vietnam)) | Businesses in Vietnam that are required to collect tax will be issued an identification number. Vat numbers can be verified with the Ministry of Finance. VAT number in Vietnam format is 9999999 (exactly 7 digits). | ERROR | VN (Vietnam, Viet Nam, Viet Nam, Việt Nam, 越南) |
Identifier format invalid (VAT number (Norway)) | The VAT number is the standard Norwegian organisation number(Organisasjonsnummer) with 'MVA' as suffix. MVA stands for 'Merverdiavgift', the Norwegian word for value added tax.
The technical construction of the number specifies that: - First digit must always be either 8 or 9. - 9th digit is a modulo 11 check digit - Last 3 letters - MVA as suffix Format: 999999999MVA | ERROR | NO (Norwegen, Norway, Norvège, Norge, Noreg, Norge, 挪威) |
Identifier format invalid (VAT registration number (Israel)) | 9 digit number. If the number of digits is less than 9, then zeros should be padded to the left side. The leftmost digit is 5 for corporations. Other leftmost digits are used for individuals. The rightmost digit is a check digit (using Luhn algorithm). | ERROR | IL (Isrā'īl, Israel, Israel, Israël, Yisra'el, 以色列) |
Identifier format invalid (VAT registration number (Switzerland)) | The VAT number is based on the Swiss UID. It starts with CHE, followed by 9 digits (whereby the last digit is a MOD11 checksum), and either ends with MWST, TVA or IVA depending on the part of Switzerland a business is registered in.
The extension of the UID will change as follows: German part: MWST (German abbreviation for “Mehrwertsteuer”) French part: TVA (French abbreviation for “taxe sur la valeur ajoutée”) Italian part: IVA (Italian abbreviation for “Imposta sul valore aggiunto”) For example, a Swiss VAT number could look like this: CHE-123.456.789 MWST. Combinations with the suffixes HR and RC (which indicate that the exact identical number is a company identification number and registered in the UID register) are allowed such as MWST/HR or TVA/RC in any possible combination. | ERROR | CH (Schweiz, Switzerland, Suisse, Svizzera, Svizra, 瑞士) |
Identifier missing (Business Registration Number (New Zealand)) | Each legal entity in New Zealand should have a New Zealand Business Number. | WARNING | NZ (Neuseeland, New Zealand, Nouvelle-Zélande, Aotearoa, 新西兰) |
Inconsistency between BIC FI and EU VAT ID FI (Finland) | The Business identity code in Finland and the European VAT number in Finland share identical values. The numerical digits of the EU VAT are identical with the business identity code. There is an inconsistency when the 8 digits after the FI prefix of the EU VAT are different to the 8 digits of the business identity code (usually formatted like this "1234567-8", the dash and the EU VAT prefix "FI" are omitted by this rule) | WARNING | FI (Finnland, Finland, Suomi, Finlande, Finland, 芬兰) |
Inconsistency between Business Number and GST number (Canada) | The Canadian business number consists of 9 digits. The GST number in Canada is composed as follows: XXXXXXXXX RT YYYY whereby
| ERROR | CA (Kanada, Canada, Canada, 加拿大) |
Inconsistency between CIF ES and EU VAT ID ES (Spain) | The CIF and European VAT number for Spain share identical values. The EU VAT ID is composed by the prefix ES + the CIF. There is an inconsistency when:
| ERROR | ES (Espanya, Spanien, Spain, España, Espainia, Espagne, España, Espanha, 西班牙) |
Inconsistency between Enterprise number and European value added tax identifier (Belgium) | The EU VAT equals the Enterprise number prefixed by BE. But it is to be checked whether the resulting VAT is really existing via VIES. Not all companies require a VAT. This rules check if the belgian Enterprise number is equal to the 10 digits of EU_VAT_ID_BE. Any difference result in a rule violation. | ERROR | BE (Belgien, Belgium, Belgique, België, 比利时) |
Inconsistency between Fiscal Registration Code and EU VAT ID RO (Romania) | European value added tax identifier (Romania) consists of prefix "RO" and 9 digits which are equal to the Fiscal Registration Code (Romania) and a checksum digit. This rules check if the Fiscal Registration Code (Romania) is equal to the 9 digits in EU VAT RO identifier.
Note that if the length of theFiscal Registration Codes is less than 9 digits or European value added tax identifier (Romania) has less the 10 digits, leading zeros must be assumed. Any difference result in a rule violation. | ERROR | RO (Rumänien, Romania, Roumanie, România, 羅馬尼亞) |
Inconsistency between NIP number and European value added tax identifier (Poland) | PL_NIP = EU_VAT_ID_PL number without prefix PL. This rules check if the 10 digits following the "PL" in EU_VAT_ID_PL are equal to NIP number. Any difference result in a rule violation. | ERROR | PL (Polen, Poland, Pologne, Polska, 波兰) |
Inconsistency between PAN and GST number (India) | The Permanent Account Number (PAN) in India is the Tax Identification number. It consists of 10 characters composed as follows: SSYYYYYYYYYYNZX whereby
| ERROR | IN (Indien, India, Inde, Bhārat, 印度) |
Inconsistency between RSIN and European value added tax identifier (The Netherlands) | The 9 digits following the "NL"-prefix are the RSIN. This rules check if the 9 digits following the "NL" in EU_VAT_ID_BE are equal to RSIN. Any difference result in a rule violation. | ERROR | NL (Königreich der Niederlande, Netherlands, Pays-Bas, Nederland, 荷蘭) |
Inconsistency between SIREN and EU VAT ID (France) | The SIREN number is the French Company Register ID. It consists of 9 digits. The VAT number in France is composed as follows: FR XX YYYYYYYYY whereby
| ERROR | FR (Frankreich, France, France, 法国) |
Invalid checksum of EU VAT ID SI (Slovenia) | This rule checks the checksum of the European VAT number in Slovenia using the following logic:
| ERROR | SI (Slowenien, Slovenia, Slovénie, Slovenija, 斯洛文尼亚) |