[ ]:

Data Wrangler Decorators Part 1: The @funnel Decorator

This tutorial introduces the powerful @funnel decorator, which automatically converts function inputs to pandas DataFrames. This allows you to write functions that work seamlessly with any data type that data-wrangler supports.

The @funnel Decorator

The @funnel decorator is the cornerstone of data-wrangler’s function integration system. It automatically wrangles function arguments into DataFrames, allowing your functions to work with:

  • Raw arrays, lists, and nested data structures

  • Text data (automatically embedded using NLP models)

  • Files and URLs

  • Mixed data types

  • Any other data type supported by data-wrangler

Let’s see how this works in practice.

[ ]:
import datawrangler as dw
import pandas as pd
import numpy as np
from datawrangler import funnel
import matplotlib.pyplot as plt

Basic Example: Numerical Analysis Function

Let’s start with a simple function that computes basic statistics. Without @funnel, this would only work with DataFrames:

[ ]:
# Define a function that works on DataFrames
@funnel
def compute_stats(data):
    \"\"\"Compute basic statistics for numerical data\"\"\"
    return {
        'mean': data.mean().mean(),
        'std': data.std().mean(),
        'shape': data.shape,
        'columns': list(data.columns)
    }

# Test with different data types
print("=== Testing with different data types ===")

# 1. Raw numpy array
array_data = np.random.randn(10, 5)
print("\\n1. NumPy Array:")
print(f"Input shape: {array_data.shape}")
stats = compute_stats(array_data)
print(f"Result: {stats}")

# 2. Python list
list_data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print("\\n2. Python List:")
print(f"Input: {list_data}")
stats = compute_stats(list_data)
print(f"Result: {stats}")

# 3. Already a DataFrame
df_data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print("\\n3. Pandas DataFrame:")
print(f"Input shape: {df_data.shape}")
stats = compute_stats(df_data)
print(f"Result: {stats}")

Text Processing with @funnel

One of the most powerful features is how @funnel handles text data automatically. Let’s create a function that analyzes text sentiment and see how it works with different text inputs:

[ ]:
@funnel
def analyze_text_dimensions(text_data, text_kwargs={'model': 'all-MiniLM-L6-v2'}):
    \"\"\"Analyze the dimensionality and characteristics of text embeddings\"\"\"
    print(f"Received DataFrame with shape: {text_data.shape}")
    print(f"Data type: {type(text_data)}")
    print(f"Columns: {list(text_data.columns)}")

    # Basic statistics about the embeddings
    stats = {
        'embedding_dimensions': text_data.shape[1],
        'num_texts': text_data.shape[0],
        'mean_embedding_magnitude': np.sqrt((text_data ** 2).sum(axis=1)).mean(),
        'embedding_std': text_data.std().mean()
    }

    return stats

# Test with different text inputs
print("=== Testing text processing with @funnel ===")

# 1. Single text string
single_text = "This is a sample sentence for analysis."
print("\\n1. Single text string:")
print(f"Input: '{single_text}'")
result = analyze_text_dimensions(single_text)
print(f"Result: {result}")

# 2. List of texts
text_list = [
    "Data science is fascinating.",
    "Machine learning transforms industries.",
    "Natural language processing enables AI communication.",
    "Data wrangling simplifies preprocessing."
]
print("\\n2. List of texts:")
print(f"Input: {len(text_list)} texts")
result = analyze_text_dimensions(text_list)
print(f"Result: {result}")