datawrangler.decorate

datawrangler.decorate.list_generalizer(f)[source]

A decorator that makes a function work for either a single object or a list of objects by calling the function on each element

Parameters

param f:

the function to decorate, of the form f(data, *args, **kwargs).

Returns

return:

A decorated function that supports lists of data objects (rather than only non-list data objects)

datawrangler.decorate.funnel(f)[source]

A decorator that coerces any data passed into the function into a DataFrame (pandas or Polars) or a list of DataFrames

Parameters

param f:

a function of the form f(data, *args, **kwargs) that assumes data is either a DataFrame or a list of DataFrames

Returns

return:

A decorated function that supports any wrangle-able data format. The decorated function accepts an optional ‘backend’ keyword argument (‘pandas’ or ‘polars’) to specify the DataFrame backend.

Notes

The decorated function can be called with: - backend=’pandas’: Convert inputs to pandas DataFrames (default) - backend=’polars’: Convert inputs to Polars DataFrames for better performance

datawrangler.decorate.interpolate(f)[source]

A decorator that fills in missing data by imputing and/or interpolating missing values

Parameters

param f:

a function of the form f(data, *args, **kwargs) that assumes the data are formatted as either a DataFrame or a list of DataFrames, with no missing (numpy.nan) values

Returns

return:

A decorated function that supports any wrangle-able datatype. Pass in the following keyword arguments to

fill in missing data:
backend: str, optional (‘pandas’ or ‘polars’)

Specify the DataFrame backend. If not provided, preserves input backend.

interp_kwargs: a dictionary containing interpolation/imputation parameters:
impute_kwargs: a dictionary containing one or more scikit-learn imputation models (e.g.,

{‘model’: ‘IterativeImputer’}. The ‘model’ can be specified as defined in the apply_sklearn_model function.

Any other keywords are passed to the DataFrame’s interpolate method; e.g. method=’linear’ will apply linear

interpolation to fill in missing values. For pandas DataFrames, supported arguments are documented at: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html

Notes

Backend-specific behavior: - pandas: Full interpolation support with all pandas.DataFrame.interpolate() methods - Polars: Limited interpolation support; automatically converts to pandas for interpolation, then back to Polars If no interpolation arguments are specified, no interpolation is performed.

datawrangler.decorate.apply_stacked(f)[source]
Decorate a function to adjust how it handles data as follows:
  • Wrangle the data into DataFrames (the resulting DataFrames must all have the same number of columns). MultiIndex DataFrames are also supported (and can represent already-stacked datasets)

  • Vertically concatenate the wrangled data

  • Apply the function to the “stacked” dataset, treating the combined data as a “single” DataFrame

  • If the original dataset was provided in “unstacked” format, unstack the result into a list of DataFrames

  • Return the resulting (stacked or unstacked) DataFrame(s)

Parameters

param f:

a function of the form f(data, *args, **kwargs) that assumes data is a single DataFrame, and that returns a single DataFrame as output.

Returns

return:

a decorated function that supports any wrangle-able data types, applies the original function to the full

list of datasets simultaneously, and then returns the result(s) as a new DataFrame or list of DataFrames.

datawrangler.decorate.apply_unstacked(f)[source]
Decorate a function to adjust how it handles data as follows:
  • Wrangle the data into a list of DataFrames. MultiIndex DataFrames are also supported (and can represent stacked datasets)

  • Apply the function (individually) to each DataFrame in the resulting list

  • If the original dataset was provided in “stacked” format, stack the result into a MultiIndex DataFrame

  • Return the resulting (stacked or unstacked) DataFrame(s)

Parameters

param f:

a function of the form f(data, *args, **kwargs) that assumes data is a single DataFrame, and that returns a single DataFrame as output.

Returns

return:

A decorated function that supports any wrangle-able data types, applies the original function to the full

list of datasets separately, and then returns the result(s) as a new DataFrame or list of DataFrames.