Data quality rules

From CDQ Wiki
Public:Data quality rules
Jump to navigation Jump to search

Documentation of 2,931 data quality rules with explanation and technical constraints to validate business partner data records.

Description

CDQ provides documented data quality rules to validate business partner records in a consistent, service-ready way. Instead of translating changing business requirements into repeated, manual IT implementation, CDQ maintains rule logic and the required reference knowledge (for example country specific formats, legal forms, and Business identifier metadata specifics) so customers can apply checks reliably across systems and processes.

Rules can use different types of inputs depending on the validation need. Some checks are purely structural and can be executed on the record itself, others require managed reference data, and some depend on external validation services (for example for tax related checks that go beyond format). The same rule set can be applied both in real time workflows and in batch based data quality assurance, exposed through a single execution interface and integrable via APIs.

Where a rule relies on external or community defined requirements, CDQ maintains supporting sources to ensure content correctness and traceability. Depending on the case, this can include an authority reference, another trustworthy publication, or a community-provided standard, along with the relevant metadata to document why the rule exists and what it is based on.

Types of Data Quality Rules

CDQ structures data quality rules in a way that reflects how they are applied in business processes and CDQ services. Instead of presenting rules only as technical validations, they are organised along two complementary concepts, rule categories and rule features.

Rule categories describe the nature of a detected issue from a business perspective. A category represents a specific type of data quality problem, for example missing information, invalid formats, inactivity, or misplaced content. Many individual rules can belong to the same category, often adapted to country specific standards, identifier schemes, or regulatory requirements. This allows consistent interpretation of results across different data domains while still supporting detailed technical validation logic.

Rule features group related categories into business capabilities. A feature reflects how rules are used within CDQ services, for example validating addresses, checking identifiers, or ensuring compliance relevant attributes. From a customer perspective, features provide a clear understanding of which aspect of business partner data quality is being evaluated and how rule results contribute to operational processes, monitoring, and analytics.

This layered structure enables a scalable rule landscape. New technical validations can be added without changing the business meaning of categories or features, and the same feature can be executed consistently in batch analyses, continuous monitoring, or real time workflows.

Address checks

CDQ manages 1,245 data quality rules for this feature. See details...

CDQ address checks ensure that business partner address data is accurate, complete, and compliant with country-specific standards. These checks validate the presence and correctness of essential elements such as administrative areas, localities, and thoroughfares. Post codes are managed separately. They detect incomplete or outdated information, flag misplaced components like "care of" or contact details, and ensure postal codes and phone numbers follow correct formats. Additionally, the system identifies inactive addresses and ensures that all required fields, such as postal codes and administrative areas, are correctly populated. By applying these validations, the checks maintain high data quality and reduce errors.

This feature comprises rules from 13 rule categories:

  • Address incomplete (0 rules): Flags addresses that are missing essential components required for completeness and usability.
  • Country invalid (6 rules): Flags addresses with country values that are misplaced, incorrectly formatted, or don't match expected country codes.
  • Dummy postal code (251 rules): Detects postal codes that follow dummy or placeholder patterns indicating invalid data.
  • Missing locality (200 rules): Identifies addresses that lack locality information (city, town, village) where it's required.
  • PO Box misplaced (9 rules): Detects PO Box information placed in street address fields instead of designated PO Box fields.
  • Phone number format (0 rules): Validates that phone numbers follow correct formatting conventions and structural patterns.
  • Postal code invalid (0 rules): Identifies postal codes that don't exist in the postal code registry for the given country.
  • Postal code schema (2 rules): Validates that postal codes follow official formatting schemas with correct spacing and separators.

Bank account checks

CDQ manages 140 data quality rules for this feature. See details...

Bank Account Checks enable the validation of bank and bank account master data with respect to different data quality criteria.

This feature comprises rules from 4 rule categories:

  • IBAN checkdigit (1 rules): Validates the checkdigit component of IBAN (International Bank Account Number) values.
  • IBAN format (1 rules): Validates that IBAN values conform to the correct structural format for international bank accounts.

Bank account data checks

CDQ manages 25 data quality rules for this feature. See details...

Bank Account Data Checks enable comprehensive validation of bank and bank account master data with respect to different data quality criteria. This includes validation of IBAN format and checkdigits, as well as domestic bank identifiers and account numbers. These checks ensure that bank account information meets required formatting standards and is internally consistent.

This feature comprises rules from 4 rule categories:

  • IBAN checkdigit (1 rules): Validates the checkdigit component of IBAN (International Bank Account Number) values.
  • IBAN format (1 rules): Validates that IBAN values conform to the correct structural format for international bank accounts.

Business and tax identifier qualifications

CDQ manages 78 data quality rules for this feature. See details...

Business and Tax Identifier Qualifications verify that business partner name and address information is genuinely associated with a given identifier by cross-referencing external data sources and registries. This includes EU VAT qualification through multiple channels (VIES, BZSt, AT.FON), worldwide tax qualification, and general identifier qualification. The service provides fine-grained qualification at the company name, address, post code, locality, and street level.

This feature comprises rules from 11 rule categories:

  • EU TAX Qualification (133 rules): Validates EU VAT IDs by checking if the associated company name and address match official registration data.
  • EU VAT Qualification (AT) (132 rules): Validates non-Austrian EU VAT IDs from the perspective of an Austrian company using Austrian tax authority services.
  • Identifier qualification (482 rules): Validates identifiers by verifying that associated name and address data match external authoritative sources.
  • Qualification (482 rules): Performs comprehensive validation of identifiers including name and address matching against official sources.

Business partner checks

CDQ manages 338 data quality rules for this feature. See details...

Business partner checks validate core attributes that establish a business partner record, including the presence of required business partner names and detection of inactive business partners. These checks also ensure that contact information and care-of details are correctly placed in their designated fields rather than in business partner name or address fields. By applying these validations, the system maintains data quality and ensures that business partner records contain accurate and properly structured information essential for operations.

This feature comprises rules from 4 rule categories:

  • Care of misplaced (4 rules): Detects care-of information placed in incorrect address fields instead of dedicated care-of fields.
  • Contact misplaced (2 rules): Identifies contact person information incorrectly placed in business partner name or address fields.
  • Name missing (2 rules): Identifies business partner records that lack a name, which is a fundamental required attribute.

Business partner profile checks

CDQ manages 7 data quality rules for this feature. See details...

Business Partner Profile Checks provide comprehensive validation of business partner profile data. These checks verify the presence and quality of business partner names, detect name patterns that indicate dummy or test data, identify inactive business partners, and ensure that contact information and care-of details are correctly placed in their designated fields. By applying these validations, the system maintains high data quality across the entire business partner profile.

This feature comprises rules from 5 rule categories:

  • Care of misplaced (4 rules): Detects care-of information placed in incorrect address fields instead of dedicated care-of fields.
  • Contact misplaced (2 rules): Identifies contact person information incorrectly placed in business partner name or address fields.
  • Name missing (2 rules): Identifies business partner records that lack a name, which is a fundamental required attribute.
  • Name pattern (255 rules): Identifies business partner names that follow dummy, test, or suspicious patterns.

Compliance and risk checks

CDQ manages 2 data quality rules for this feature. See details...

Compliance and Risk Checks enable the identification of compliance or risk relevant issues in a business partner data record.

This feature comprises rules from 1 rule categories:

Identifier checks

CDQ manages 775 data quality rules for this feature. See details...

CDQ data quality profiling services enable the validation of business and location identifiers (i.e. VAT numbers, Tax identifiers, national identifiers and other third party and proprietary identifiers) with respect to different data quality criteria. In particular this data capability inspects VAT IDs, Tax IDs, National identifiers (e.g. company register IDs), and other IDs (such as DUNS or Legal Entity Identifier) with respect to Existence, Format, Reference format, Checksums/Checkdigits (inner-consistency), and Consistency.

This feature comprises rules from 9 rule categories:

  • Identifier format (228 rules): Validates that business identifiers conform to their specified syntactic format and structure.
  • Identifier missing (135 rules): Flags business partner records that are missing mandatory or important identifiers.
  • Identifier qualification (482 rules): Validates identifiers by verifying that associated name and address data match external authoritative sources.
  • Identifier schema (97 rules): Validates that identifiers follow official formatting schemas including separators, spacing, and presentation.

Identifier qualifications

CDQ manages 292 data quality rules for this feature. See details...

For a given identifier, the company's name and address it is checked whether the name and address are really associated with this identifier. This means that in a external managed business partner data source or the CDQ database the name and address belong to the entity that comprises the identifier value. The service employs a fine granular qualification of the identifier by not only considering the company name and the address as a whole but also on post code, locality (i.e. city, town etc.) and street level. Doing so the CDQ enables a qualified validation of identifiers according to EU VAT compliance rules (in German: "UID Bestätigungsverfahren").

This feature comprises rules from 10 rule categories:

  • EU TAX Qualification (133 rules): Validates EU VAT IDs by checking if the associated company name and address match official registration data.
  • EU VAT Qualification (AT) (132 rules): Validates non-Austrian EU VAT IDs from the perspective of an Austrian company using Austrian tax authority services.
  • Qualification (482 rules): Performs comprehensive validation of identifiers including name and address matching against official sources.

Legal form checks

CDQ manages 0 data quality rules for this feature. See details...

Legal form checks validate that business partner legal form information is present and correctly specified. These checks ensure that legal forms exist and are valid for the given country or jurisdiction, and that they are not missing when required for a business partner record. By applying these validations, the system maintains data quality and ensures that business partner records contain accurate legal entity type information essential for compliance and business operations.

This feature comprises rules from 2 rule categories:

  • Legal form invalid (101 rules): Identifies legal forms that don't exist, are invalid for the country, or are incorrectly specified.
  • Missing legal form (1 rules): Flags business partner records that lack legal form information where it should be present.

Name checks

CDQ manages 1 data quality rules for this feature. See details...

Name checks validate that business partner name data is present and of sufficient quality. These checks ensure that required names are not missing and detect suspicious name patterns that may indicate dummy, test, or placeholder data entries. By applying these validations, the system helps maintain accurate and reliable business partner naming information.

This feature comprises rules from 2 rule categories:

  • Name missing (2 rules): Identifies business partner records that lack a name, which is a fundamental required attribute.
  • Name pattern (255 rules): Identifies business partner names that follow dummy, test, or suspicious patterns.

Lifecycle and release status

Data quality rules follow a defined lifecycle so they stay reliable, traceable, and aligned with changing external standards and community requirements. The lifecycle describes how rules move from an initial requirement to an active rule in CDQ services, and how they are maintained over time.

Activity Rule release status<br/>The release status in terms of development progress or maturity of a data quality rule.<br/>IDEA: Initial rule definition that documents a business requirement but is not yet active in services.<br/>DRAFT: Rule concept is being prepared or refined but is not yet finalized for implementation or execution.<br/>HYPERCARE: Rule is newly released and under increased observation to ensure stable behaviour and correct results.<br/>RELEASED: Rule has passed verification and is actively executed in productive CDQ services.<br/>DEACTIVATED: Rule is temporarily removed from the active rule set because it needs correction or clarification before re release.<br/>ARCHIVED: Rule is permanently retired and no longer maintained or intended for future activation. Rule release status Description
Step 1:
Rule ideation and agreement
IDEA A new rule idea is proposed and discussed collaboratively, typically in dialogue between CDQ experts and the CDQ Data Sharing Community. The goal is to agree on the underlying business requirement, expected outcomes, and applicability. In this stage, the rule is not yet specified for implementation and is not active in services.
Step 2:
Rule specification and documentation
DRAFT The rule is specified in a form that is ready to be implemented. The author documents purpose, violation message, criticality (ERROR, WARNING, INFO), examples, validation source, affected data model concepts, data quality dimension, and country scope. Reviews can be done by the Data Sharing Community and or designated approvers. In this stage, the rule is refined, but it is not yet executed in services.
Step 3:
Implementation and verification
DRAFT The rule is implemented for execution within CDQ services and aligned with existing rule features and categories. Automated and or expert verification ensures the rule behaves as intended, including checks for correctness, stability, and impact. If issues are found during verification, the rule remains in DRAFT until they are resolved.
Step 4:
Release candidate and strict monitoring
HYPERCARE The rule is deployed as a release candidate and executed under increased observation for a short period (typically a few weeks). Monitoring focuses on unwanted behaviour, unexpected false positives, performance issues, and ambiguous outcomes. If hypercare findings require changes, the rule is moved back to DRAFT for refinement and re implementation. If the rule is stable, it is promoted to RELEASED.
Step 5:
Productive rule execution
RELEASED The rule is considered stable and is actively executed as part of the productive rule set used by CDQ services in batch analyses and real time validation. Released rules are maintained over time and may be updated based on new reference data, model changes, or community feedback.
Step 6:
Deactivation and re release cycle
DEACTIVATED If a released rule needs correction or clarification, it is temporarily removed from the active rule set to prevent undesired impact. The product owner and subject matter experts analyse the issue and refine the rule definition and or implementation, moving it back to DRAFT. After verification, the rule re enters HYPERCARE and, if stable, is promoted again to RELEASED.
Step 7:
Retirement and documentation
ARCHIVED Rules that are no longer relevant, have been superseded, or must not be used anymore are permanently retired. Archived rules cannot be re activated, they are kept only for documentation, traceability, and lineage purposes.

Minor changes

Minor changes are updates that do not affect subsequent or dependent functionality. They can be applied without a dedicated change process and typically do not require taking the rule out of service.

Minor changes include:

  • Updates to documentation fields (rule description, violation message, examples) that correct wording, improve clarity, fix typos, or provide a more intuitive example, as long as the described rule logic does not change.
  • Any changes to fields other than criticality, primary validation source, managed concepts, and country scope.
  • Changes to rules that have criticality INFO, since they are non blocking by design.

Special case: if a documentation update also implies a change in the intended logic, it is handled as a major change.

Major changes

Major changes are updates that can impact dependent functionality, service behavior, or interpretation of results. Major changes follow the same governance process as introducing a new rule to preserve traceability and ensure consistent execution across CDQ services.

Major changes include:

  • Changes to any of these fields: criticality, primary validation source, managed concepts, country scope.
  • Changes to the rule logic, including cases where:
    • An error or missing edge case is discovered in the existing logic.
    • The external world changes and the rule must adapt, for example updated identifier formats, new postal code systems, or revised reference standards.

Major changes can be triggered internally or via community requests. In both cases, the rule is revised and moved through the lifecycle again so the updated rule can be planned, released, and communicated consistently.

Contribute!

We are continuously defining and implementing additional rules. Please get in touch with us if you observe that a data quality rule is missing! Also, if you are interested in the rules' management architecture and its implementation, we would be happy to provide you with additional information or showcases.

Some theoretical background

From a theoretical point of view the data model concepts and the relations between them define our domain (in other words the world as it is understood by CDQ). Within this world everything would be possible when there are no rules. Business rules constrain this world by reducing the space of possible instantiations of the modeled domain. An example for this is a business rule that constrains the possible values a country. It says that an allowed value for a country are only those countries that are defined in the ISO 3166-1 standard. These countries are documented as reference data in this portal. Without this rule a country could have any other value such as Romulus. To take up again the wording from above: The documented countries are knowledge about the CDL world (domain), and this knowledge is used to constrain the possible space of options for the name of a country.

From a theoretical point of view, the data model concepts and the relations between them define our domain, in other words the world as it is understood by CDQ. If we only had the model, this world would be very permissive, almost anything could be instantiated because the model alone does not fully specify what should be considered valid or meaningful.

This is where rules come in, they constrain the model by reducing the space of allowed instantiations. They introduce semantics, expectations, and boundaries that turn "possible" into "acceptable" for a specific use case, service, or standard. A simple example is the country concept. A rule can constrain the allowed values to the countries defined by ISO 3166-1. These countries are documented as reference data in this portal. Without such a rule, a country attribute could technically take any value, including Romulus. Put differently, the documented countries are part of the knowledge about the CDQ domain, and rules use that knowledge to restrict what counts as a valid country value.