[ ]:

Data Wrangler Decorators Part 2: Advanced Decorators

This tutorial covers advanced decorator functionality in data-wrangler, including interpolation, stacking/unstacking operations, and building complex data processing pipelines.

Advanced Decorators Overview

Beyond the basic @funnel decorator, data-wrangler provides specialized decorators for:

  • ``@interpolate``: Automatic handling of missing data

  • ``@apply_stacked``: Operations on stacked (melted) data

  • ``@apply_unstacked``: Operations on unstacked (pivoted) data

  • Custom decorator combinations: Chaining decorators for complex workflows

These decorators enable sophisticated data preprocessing pipelines with minimal code.

[ ]:
import datawrangler as dw
import pandas as pd
import numpy as np
from datawrangler.decorate import funnel, interpolate, apply_stacked, apply_unstacked
import matplotlib.pyplot as plt

The @interpolate Decorator

The @interpolate decorator automatically handles missing data by applying interpolation methods before passing data to your function. This is particularly useful for time series analysis and data cleaning pipelines.

[ ]:
# Create sample data with missing values
np.random.seed(42)
dates = pd.date_range('2024-01-01', periods=20, freq='D')
values = np.random.randn(20).cumsum()
# Introduce some missing values
values[5:8] = np.nan
values[15] = np.nan

# Create DataFrame with missing data
timeseries_data = pd.DataFrame({
    'date': dates,
    'value': values,
    'category': ['A'] * 10 + ['B'] * 10
})

print("Original data with missing values:")
print(timeseries_data)
print(f"\nMissing values: {timeseries_data['value'].isna().sum()}")
[ ]:
# Define a function that computes rolling statistics
@funnel
@interpolate(method='linear')
def compute_rolling_stats(data, window=5):
    \"\"\"Compute rolling statistics on clean data\"\"\"
    if 'value' not in data.columns:
        return pd.DataFrame()

    result = pd.DataFrame({
        'rolling_mean': data['value'].rolling(window=window).mean(),
        'rolling_std': data['value'].rolling(window=window).std(),
        'rolling_min': data['value'].rolling(window=window).min(),
        'rolling_max': data['value'].rolling(window=window).max()
    })

    return result

# Apply to data with missing values - interpolation happens automatically
rolling_stats = compute_rolling_stats(timeseries_data)

print("Rolling statistics computed on interpolated data:")
print(rolling_stats.head(10))

# Verify no missing values in the processed data
print(f"\nMissing values after interpolation: {rolling_stats.isna().sum().sum()}")