validate_duplicates()
Validate that there are no duplicate rows for a given set of key columns.
Usage
validate_duplicates(
df,
dim_comp,
max_errors=1000,
)Parameters
df: pd.DataFrame-
The DataFrame to validate.
dim_comp: list[str]-
Column names forming the uniqueness key (dimensions).
max_errors: int = 1000-
Maximum number of duplicate key combinations to include in the error message. Defaults to
1000.
Raises
ValueError-
If duplicate rows are found, reporting the count and the offending key combinations (up to
max_errors).