{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "" ] }, { "cell_type": "markdown", "source": "# Data Wrangler Decorators Part 1: The @funnel Decorator\n\nThis tutorial introduces the powerful `@funnel` decorator, which automatically converts function inputs to pandas DataFrames. This allows you to write functions that work seamlessly with any data type that data-wrangler supports.\n\n## The @funnel Decorator\n\nThe `@funnel` decorator is the cornerstone of data-wrangler's function integration system. It automatically wrangles function arguments into DataFrames, allowing your functions to work with:\n\n- Raw arrays, lists, and nested data structures\n- Text data (automatically embedded using NLP models)\n- Files and URLs\n- Mixed data types\n- Any other data type supported by data-wrangler\n\nLet's see how this works in practice.", "metadata": {} }, { "cell_type": "code", "source": "import datawrangler as dw\nimport pandas as pd\nimport numpy as np\nfrom datawrangler import funnel\nimport matplotlib.pyplot as plt", "metadata": {}, "outputs": [], "execution_count": null }, { "cell_type": "markdown", "source": "## Basic Example: Numerical Analysis Function\n\nLet's start with a simple function that computes basic statistics. Without @funnel, this would only work with DataFrames:", "metadata": {} }, { "cell_type": "code", "source": "# Define a function that works on DataFrames\n@funnel\ndef compute_stats(data):\n \\\"\\\"\\\"Compute basic statistics for numerical data\\\"\\\"\\\"\n return {\n 'mean': data.mean().mean(),\n 'std': data.std().mean(), \n 'shape': data.shape,\n 'columns': list(data.columns)\n }\n\n# Test with different data types\nprint(\"=== Testing with different data types ===\")\n\n# 1. Raw numpy array\narray_data = np.random.randn(10, 5)\nprint(\"\\\\n1. NumPy Array:\")\nprint(f\"Input shape: {array_data.shape}\")\nstats = compute_stats(array_data)\nprint(f\"Result: {stats}\")\n\n# 2. Python list\nlist_data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]\nprint(\"\\\\n2. Python List:\")\nprint(f\"Input: {list_data}\")\nstats = compute_stats(list_data)\nprint(f\"Result: {stats}\")\n\n# 3. Already a DataFrame\ndf_data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})\nprint(\"\\\\n3. Pandas DataFrame:\")\nprint(f\"Input shape: {df_data.shape}\")\nstats = compute_stats(df_data)\nprint(f\"Result: {stats}\")", "metadata": {}, "outputs": [], "execution_count": null }, { "cell_type": "markdown", "source": "## Text Processing with @funnel\n\nOne of the most powerful features is how @funnel handles text data automatically. Let's create a function that analyzes text sentiment and see how it works with different text inputs:", "metadata": {} }, { "cell_type": "code", "source": "@funnel\ndef analyze_text_dimensions(text_data, text_kwargs={'model': 'all-MiniLM-L6-v2'}):\n \\\"\\\"\\\"Analyze the dimensionality and characteristics of text embeddings\\\"\\\"\\\"\n print(f\"Received DataFrame with shape: {text_data.shape}\")\n print(f\"Data type: {type(text_data)}\")\n print(f\"Columns: {list(text_data.columns)}\")\n \n # Basic statistics about the embeddings\n stats = {\n 'embedding_dimensions': text_data.shape[1],\n 'num_texts': text_data.shape[0],\n 'mean_embedding_magnitude': np.sqrt((text_data ** 2).sum(axis=1)).mean(),\n 'embedding_std': text_data.std().mean()\n }\n \n return stats\n\n# Test with different text inputs\nprint(\"=== Testing text processing with @funnel ===\")\n\n# 1. Single text string\nsingle_text = \"This is a sample sentence for analysis.\"\nprint(\"\\\\n1. Single text string:\")\nprint(f\"Input: '{single_text}'\")\nresult = analyze_text_dimensions(single_text)\nprint(f\"Result: {result}\")\n\n# 2. List of texts\ntext_list = [\n \"Data science is fascinating.\",\n \"Machine learning transforms industries.\", \n \"Natural language processing enables AI communication.\",\n \"Data wrangling simplifies preprocessing.\"\n]\nprint(\"\\\\n2. List of texts:\")\nprint(f\"Input: {len(text_list)} texts\")\nresult = analyze_text_dimensions(text_list)\nprint(f\"Result: {result}\")", "metadata": {}, "outputs": [], "execution_count": null } ], "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }