datawrangler.decorate#

datawrangler.decorate.list_generalizer(f)[source]#

A decorator that makes a function work for either a single object or a list of objects by calling the function on each element

Parameters

f – the function to decorate, of the form f(data, *args, **kwargs).

Returns

A decorated function that supports lists of data objects (rather than only non-list data objects)

datawrangler.decorate.funnel(f)[source]#

A decorator that coerces any data passed into the function into a pandas DataFrame or a list of DataFrames

Parameters

f – a function of the form f(data, *args, **kwargs) that assumes data is either a DataFrame or a list of DataFrames

Returns

A decorated function the supports any wrangle-able data format

datawrangler.decorate.interpolate(f)[source]#

A decorator that fills in missing data by imputing and/or interpolating missing values

Parameters

f – a function of the form f(data, *args, **kwargs) that assumes the data are formatted as either a DataFrame or a list of DataFrames, with no missing (numpy.nan) values

Returns

A decorated function that supports any wrangle-able datatype. Pass in the following keyword arguments to

fill in missing data:
impute_kwargs: a dictionary containing one or more scikit-learn imputation models (e.g.,

{‘model’: ‘IterativeImputer’}. The ‘model’ can be specified as defined in the apply_sklearn_model function.

any other keywords are passed to pandas.DataFrame.interpolate; e.g. method=’linear’ will apply linear

interpolation to fill in missing values. A full list of supported arguments may be found here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html If no other keyword arguments are specified, no interpolation is performed.

datawrangler.decorate.apply_stacked(f)[source]#
Decorate a function to adjust how it handles data as follows:
  • Wrangle the data into DataFrames (the resulting DataFrames must all have the same number of columns). MultiIndex DataFrames are also supported (and can represent already-stacked datasets)

  • Vertically concatenate the wrangled data

  • Apply the function to the “stacked” dataset, treating the combined data as a “single” DataFrame

  • If the original dataset was provided in “unstacked” format, unstack the result into a list of DataFrames

  • Return the resulting (stacked or unstacked) DataFrame(s)

Parameters

f – a function of the form f(data, *args, **kwargs) that assumes data is a single DataFrame, and that returns a single DataFrame as output.

Returns

a decorated function that supports any wrangle-able data types, applies the original function to the full

list of datasets simultaneously, and then returns the result(s) as a new DataFrame or list of DataFrames.

datawrangler.decorate.apply_unstacked(f)[source]#
Decorate a function to adjust how it handles data as follows:
  • Wrangle the data into a list of DataFrames. MultiIndex DataFrames are also supported (and can represent stacked datasets)

  • Apply the function (individually) to each DataFrame in the resulting list

  • If the original dataset was provided in “stacked” format, stack the result into a MultiIndex DataFrame

  • Return the resulting (stacked or unstacked) DataFrame(s)

Parameters

f – a function of the form f(data, *args, **kwargs) that assumes data is a single DataFrame, and that returns a single DataFrame as output.

Returns

A decorated function that supports any wrangle-able data types, applies the original function to the full

list of datasets separately, and then returns the result(s) as a new DataFrame or list of DataFrames.