{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "" ] }, { "cell_type": "markdown", "source": "# Data Wrangler Decorators Part 2: Advanced Decorators\n\nThis tutorial covers advanced decorator functionality in data-wrangler, including interpolation, stacking/unstacking operations, and building complex data processing pipelines.\n\n## Advanced Decorators Overview\n\nBeyond the basic `@funnel` decorator, data-wrangler provides specialized decorators for:\n\n- **`@interpolate`**: Automatic handling of missing data\n- **`@apply_stacked`**: Operations on stacked (melted) data\n- **`@apply_unstacked`**: Operations on unstacked (pivoted) data \n- **Custom decorator combinations**: Chaining decorators for complex workflows\n\nThese decorators enable sophisticated data preprocessing pipelines with minimal code.", "metadata": {} }, { "cell_type": "code", "source": "import datawrangler as dw\nimport pandas as pd\nimport numpy as np\nfrom datawrangler.decorate import funnel, interpolate, apply_stacked, apply_unstacked\nimport matplotlib.pyplot as plt", "metadata": {}, "outputs": [], "execution_count": null }, { "cell_type": "markdown", "source": "## The @interpolate Decorator\n\nThe `@interpolate` decorator automatically handles missing data by applying interpolation methods before passing data to your function. This is particularly useful for time series analysis and data cleaning pipelines.", "metadata": {} }, { "cell_type": "code", "source": "# Create sample data with missing values\nnp.random.seed(42)\ndates = pd.date_range('2024-01-01', periods=20, freq='D')\nvalues = np.random.randn(20).cumsum()\n# Introduce some missing values\nvalues[5:8] = np.nan\nvalues[15] = np.nan\n\n# Create DataFrame with missing data\ntimeseries_data = pd.DataFrame({\n 'date': dates,\n 'value': values,\n 'category': ['A'] * 10 + ['B'] * 10\n})\n\nprint(\"Original data with missing values:\")\nprint(timeseries_data)\nprint(f\"\\nMissing values: {timeseries_data['value'].isna().sum()}\")", "metadata": {}, "outputs": [], "execution_count": null }, { "cell_type": "code", "source": "# Define a function that computes rolling statistics\n@funnel\n@interpolate(method='linear')\ndef compute_rolling_stats(data, window=5):\n \\\"\\\"\\\"Compute rolling statistics on clean data\\\"\\\"\\\"\n if 'value' not in data.columns:\n return pd.DataFrame()\n \n result = pd.DataFrame({\n 'rolling_mean': data['value'].rolling(window=window).mean(),\n 'rolling_std': data['value'].rolling(window=window).std(),\n 'rolling_min': data['value'].rolling(window=window).min(),\n 'rolling_max': data['value'].rolling(window=window).max()\n })\n \n return result\n\n# Apply to data with missing values - interpolation happens automatically\nrolling_stats = compute_rolling_stats(timeseries_data)\n\nprint(\"Rolling statistics computed on interpolated data:\")\nprint(rolling_stats.head(10))\n\n# Verify no missing values in the processed data\nprint(f\"\\nMissing values after interpolation: {rolling_stats.isna().sum().sum()}\")", "metadata": {}, "outputs": [], "execution_count": null } ], "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }