validate_duplicates()

Validate that there are no duplicate rows for a given set of key columns.

Usage

Source

validate_duplicates(
    df,
    dim_comp,
    max_errors=1000,
)

Parameters

df: pd.DataFrame

The DataFrame to validate.

dim_comp: list[str]

Column names forming the uniqueness key (dimensions).

max_errors: int = 1000
Maximum number of duplicate key combinations to include in the error message. Defaults to 1000.

Raises

ValueError
If duplicate rows are found, reporting the count and the offending key combinations (up to max_errors).