[ ]:
Data Wrangler Utilities
This tutorial covers the utility functions in datawrangler.util that help with data type detection, validation, and manipulation. These utilities are the building blocks that power data-wrangler’s automatic data type detection.
Overview
The datawrangler.util module provides essential helper functions:
``dataframe_like()``: Check if an object behaves like a DataFrame
``array_like()``: Detect array-like objects
``depth()``: Determine nesting depth of data structures
``btwn()``: Check if values fall within a range
These utilities are particularly useful when building custom data processing pipelines or extending data-wrangler’s functionality.
[ ]:
import datawrangler as dw
from datawrangler.util import dataframe_like, array_like, depth, btwn
import pandas as pd
import numpy as np
Data Type Detection
Understanding how data-wrangler detects different data types is crucial for building robust data processing pipelines. Let’s explore the detection utilities:
[ ]:
# Test different data types with detection utilities
test_objects = [
pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}), # True DataFrame
{'A': [1, 2, 3], 'B': [4, 5, 6]}, # Dict (not DataFrame-like)
np.array([[1, 2, 3], [4, 5, 6]]), # NumPy array
[[1, 2, 3], [4, 5, 6]], # Nested list
[1, 2, 3, 4, 5], # Simple list
\"Hello World\", # String
42 # Number
]
object_names = [
\"pandas DataFrame\",
\"Dictionary\",
\"NumPy Array\",
\"Nested List\",
\"Simple List\",
\"String\",
\"Number\"
]
print(\"=== Data Type Detection Results ===\")
print(f\"{'Object Type':<20} {'DataFrame-like':<15} {'Array-like':<12} {'Depth':<8}\")
print(\"-\" * 60)
for obj, name in zip(test_objects, object_names):
is_df_like = dataframe_like(obj)
is_array_like = array_like(obj)
obj_depth = depth(obj)
print(f\"{name:<20} {str(is_df_like):<15} {str(is_array_like):<12} {obj_depth:<8}\")