[ ]:

Data Wrangler I/O Operations

This tutorial covers the I/O capabilities of data-wrangler, including loading and saving data from various sources and formats.

Overview

The datawrangler.io module provides seamless loading and saving of data from:

  • Local files: CSV, JSON, text files, images, and more

  • URLs: Load data directly from web sources

  • Multiple formats: Automatic format detection based on file extensions

  • Mixed sources: Handle lists of files/URLs with different formats

Let’s explore these capabilities with practical examples.

[ ]:
import datawrangler as dw
from datawrangler.io import load, save
import pandas as pd
import numpy as np
import os
from pathlib import Path

Loading Different File Formats

Data-wrangler automatically detects file formats and loads them appropriately. Let’s demonstrate with different file types:

[ ]:
# Create sample data files for demonstration
import tempfile

# Create a temporary directory for our examples
temp_dir = tempfile.mkdtemp()
print(f\"Working in temporary directory: {temp_dir}\"

# Create sample CSV file
csv_data = pd.DataFrame({
    'product': ['laptop', 'mouse', 'keyboard', 'monitor'],
    'price': [999.99, 25.50, 75.00, 300.00],
    'category': ['electronics', 'accessories', 'accessories', 'electronics']
})

csv_file = os.path.join(temp_dir, 'products.csv')
csv_data.to_csv(csv_file, index=False)

# Create sample text file
text_content = \"\"\"Data science is transforming industries worldwide.
Machine learning enables computers to learn from data.
Natural language processing helps computers understand human language.
Data visualization makes complex data insights accessible.\"\"\"

text_file = os.path.join(temp_dir, 'sample_text.txt')
with open(text_file, 'w') as f:
    f.write(text_content)

# Create sample JSON file
json_data = {
    'users': [
        {'name': 'Alice', 'age': 30, 'city': 'New York'},
        {'name': 'Bob', 'age': 25, 'city': 'San Francisco'},
        {'name': 'Charlie', 'age': 35, 'city': 'Chicago'}
    ]
}

json_file = os.path.join(temp_dir, 'users.json')
import json
with open(json_file, 'w') as f:
    json.dump(json_data, f)

print(f\"Created files:\"
print(f\"- CSV: {csv_file}\"
print(f\"- Text: {text_file}\"
print(f\"- JSON: {json_file}\")