sanitize_variable()
Sanitize a raw string value into a valid SDMX code ID.
Usage
sanitize_variable(
value,
uppercase=True,
)Applies the same sanitization used internally by create_schema_from_table when building codelist code IDs from DataFrame column values. Use this function during your data cleaning phase to ensure that the values in your DataFrame will match the code IDs generated in the schema.
The sanitization rules are: - Non-alphanumeric/underscore characters (including dots) are replaced with _. - Leading/trailing underscores are stripped. - IDs starting with a digit are prefixed with _. - Result is uppercased by default (controlled by uppercase).
Parameters
value: str-
The raw string value to sanitize (e.g.
"per_allsp.adq_ep_preT_tot"). uppercase: bool = True-
If True (default), the result is uppercased, matching the default behaviour of create_schema_from_table. Set to False if you called create_schema_from_table with
uppercase_code_ids=False.
Returns
str- A sanitized SDMX-safe identifier string.
Examples
>>> sanitize_variable("per_allsp.adq_ep_preT_tot")
'PER_ALLSP_ADQ_EP_PRET_TOT'
>>> sanitize_variable("per_allsp.adq_ep_preT_tot", uppercase=False)
'per_allsp_adq_ep_pret_tot'