datawrangler.decorate#
- datawrangler.decorate.list_generalizer(f)[source]#
A decorator that makes a function work for either a single object or a list of objects by calling the function on each element
- Returns
A decorated function that supports lists of data objects (rather than only non-list data objects)
- datawrangler.decorate.funnel(f)[source]#
A decorator that coerces any data passed into the function into a pandas DataFrame or a list of DataFrames
- Parameters
f – a function of the form f(data, *args, **kwargs) that assumes data is either a DataFrame or a list of DataFrames
- Returns
A decorated function the supports any wrangle-able data format
- datawrangler.decorate.interpolate(f)[source]#
A decorator that fills in missing data by imputing and/or interpolating missing values
- Parameters
f – a function of the form f(data, *args, **kwargs) that assumes the data are formatted as either a DataFrame or a list of DataFrames, with no missing (numpy.nan) values
- Returns
A decorated function that supports any wrangle-able datatype. Pass in the following keyword arguments to
- fill in missing data:
- impute_kwargs: a dictionary containing one or more scikit-learn imputation models (e.g.,
{‘model’: ‘IterativeImputer’}. The ‘model’ can be specified as defined in the apply_sklearn_model function.
- any other keywords are passed to pandas.DataFrame.interpolate; e.g. method=’linear’ will apply linear
interpolation to fill in missing values. A full list of supported arguments may be found here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html If no other keyword arguments are specified, no interpolation is performed.
- datawrangler.decorate.apply_stacked(f)[source]#
- Decorate a function to adjust how it handles data as follows:
Wrangle the data into DataFrames (the resulting DataFrames must all have the same number of columns). MultiIndex DataFrames are also supported (and can represent already-stacked datasets)
Vertically concatenate the wrangled data
Apply the function to the “stacked” dataset, treating the combined data as a “single” DataFrame
If the original dataset was provided in “unstacked” format, unstack the result into a list of DataFrames
Return the resulting (stacked or unstacked) DataFrame(s)
- Parameters
f – a function of the form f(data, *args, **kwargs) that assumes data is a single DataFrame, and that returns a single DataFrame as output.
- Returns
a decorated function that supports any wrangle-able data types, applies the original function to the full
list of datasets simultaneously, and then returns the result(s) as a new DataFrame or list of DataFrames.
- datawrangler.decorate.apply_unstacked(f)[source]#
- Decorate a function to adjust how it handles data as follows:
Wrangle the data into a list of DataFrames. MultiIndex DataFrames are also supported (and can represent stacked datasets)
Apply the function (individually) to each DataFrame in the resulting list
If the original dataset was provided in “stacked” format, stack the result into a MultiIndex DataFrame
Return the resulting (stacked or unstacked) DataFrame(s)
- Parameters
f – a function of the form f(data, *args, **kwargs) that assumes data is a single DataFrame, and that returns a single DataFrame as output.
- Returns
A decorated function that supports any wrangle-able data types, applies the original function to the full
list of datasets separately, and then returns the result(s) as a new DataFrame or list of DataFrames.