sanitize_variable()

Sanitize a raw string value into a valid SDMX code ID.

Usage

Source

sanitize_variable(
    value,
    uppercase=True,
)

Applies the same sanitization used internally by create_schema_from_table when building codelist code IDs from DataFrame column values. Use this function during your data cleaning phase to ensure that the values in your DataFrame will match the code IDs generated in the schema.

The sanitization rules are: - Non-alphanumeric/underscore characters (including dots) are replaced with _. - Leading/trailing underscores are stripped. - IDs starting with a digit are prefixed with _. - Result is uppercased by default (controlled by uppercase).

Parameters

value: str

The raw string value to sanitize (e.g. "per_allsp.adq_ep_preT_tot").

uppercase: bool = True
If True (default), the result is uppercased, matching the default behaviour of create_schema_from_table. Set to False if you called create_schema_from_table with uppercase_code_ids=False.

Returns

str
A sanitized SDMX-safe identifier string.

Examples

>>> sanitize_variable("per_allsp.adq_ep_preT_tot")
'PER_ALLSP_ADQ_EP_PRET_TOT'
>>> sanitize_variable("per_allsp.adq_ep_preT_tot", uppercase=False)
'per_allsp_adq_ep_pret_tot'