[ ]:
Data Wrangler Decorators Part 1: The @funnel Decorator
This tutorial introduces the powerful @funnel decorator, which automatically converts function inputs to pandas DataFrames. This allows you to write functions that work seamlessly with any data type that data-wrangler supports.
The @funnel Decorator
The @funnel decorator is the cornerstone of data-wrangler’s function integration system. It automatically wrangles function arguments into DataFrames, allowing your functions to work with:
Raw arrays, lists, and nested data structures
Text data (automatically embedded using NLP models)
Files and URLs
Mixed data types
Any other data type supported by data-wrangler
Let’s see how this works in practice.
[ ]:
import datawrangler as dw
import pandas as pd
import numpy as np
from datawrangler import funnel
import matplotlib.pyplot as plt
Basic Example: Numerical Analysis Function
Let’s start with a simple function that computes basic statistics. Without @funnel, this would only work with DataFrames:
[ ]:
# Define a function that works on DataFrames
@funnel
def compute_stats(data):
\"\"\"Compute basic statistics for numerical data\"\"\"
return {
'mean': data.mean().mean(),
'std': data.std().mean(),
'shape': data.shape,
'columns': list(data.columns)
}
# Test with different data types
print("=== Testing with different data types ===")
# 1. Raw numpy array
array_data = np.random.randn(10, 5)
print("\\n1. NumPy Array:")
print(f"Input shape: {array_data.shape}")
stats = compute_stats(array_data)
print(f"Result: {stats}")
# 2. Python list
list_data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print("\\n2. Python List:")
print(f"Input: {list_data}")
stats = compute_stats(list_data)
print(f"Result: {stats}")
# 3. Already a DataFrame
df_data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print("\\n3. Pandas DataFrame:")
print(f"Input shape: {df_data.shape}")
stats = compute_stats(df_data)
print(f"Result: {stats}")
Text Processing with @funnel
One of the most powerful features is how @funnel handles text data automatically. Let’s create a function that analyzes text sentiment and see how it works with different text inputs:
[ ]:
@funnel
def analyze_text_dimensions(text_data, text_kwargs={'model': 'all-MiniLM-L6-v2'}):
\"\"\"Analyze the dimensionality and characteristics of text embeddings\"\"\"
print(f"Received DataFrame with shape: {text_data.shape}")
print(f"Data type: {type(text_data)}")
print(f"Columns: {list(text_data.columns)}")
# Basic statistics about the embeddings
stats = {
'embedding_dimensions': text_data.shape[1],
'num_texts': text_data.shape[0],
'mean_embedding_magnitude': np.sqrt((text_data ** 2).sum(axis=1)).mean(),
'embedding_std': text_data.std().mean()
}
return stats
# Test with different text inputs
print("=== Testing text processing with @funnel ===")
# 1. Single text string
single_text = "This is a sample sentence for analysis."
print("\\n1. Single text string:")
print(f"Input: '{single_text}'")
result = analyze_text_dimensions(single_text)
print(f"Result: {result}")
# 2. List of texts
text_list = [
"Data science is fascinating.",
"Machine learning transforms industries.",
"Natural language processing enables AI communication.",
"Data wrangling simplifies preprocessing."
]
print("\\n2. List of texts:")
print(f"Input: {len(text_list)} texts")
result = analyze_text_dimensions(text_list)
print(f"Result: {result}")