[ ]:

Data Wrangler Utilities

This tutorial covers the utility functions in datawrangler.util that help with data type detection, validation, and manipulation. These utilities are the building blocks that power data-wrangler’s automatic data type detection.

Overview

The datawrangler.util module provides essential helper functions:

  • ``dataframe_like()``: Check if an object behaves like a DataFrame

  • ``array_like()``: Detect array-like objects

  • ``depth()``: Determine nesting depth of data structures

  • ``btwn()``: Check if values fall within a range

These utilities are particularly useful when building custom data processing pipelines or extending data-wrangler’s functionality.

[ ]:
import datawrangler as dw
from datawrangler.util import dataframe_like, array_like, depth, btwn
import pandas as pd
import numpy as np

Data Type Detection

Understanding how data-wrangler detects different data types is crucial for building robust data processing pipelines. Let’s explore the detection utilities:

[ ]:
# Test different data types with detection utilities
test_objects = [
    pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}),  # True DataFrame
    {'A': [1, 2, 3], 'B': [4, 5, 6]},  # Dict (not DataFrame-like)
    np.array([[1, 2, 3], [4, 5, 6]]),  # NumPy array
    [[1, 2, 3], [4, 5, 6]],  # Nested list
    [1, 2, 3, 4, 5],  # Simple list
    \"Hello World\",  # String
    42  # Number
]

object_names = [
    \"pandas DataFrame\",
    \"Dictionary\",
    \"NumPy Array\",
    \"Nested List\",
    \"Simple List\",
    \"String\",
    \"Number\"
]

print(\"=== Data Type Detection Results ===\")
print(f\"{'Object Type':<20} {'DataFrame-like':<15} {'Array-like':<12} {'Depth':<8}\")
print(\"-\" * 60)

for obj, name in zip(test_objects, object_names):
    is_df_like = dataframe_like(obj)
    is_array_like = array_like(obj)
    obj_depth = depth(obj)

    print(f\"{name:<20} {str(is_df_like):<15} {str(is_array_like):<12} {obj_depth:<8}\")