[ ]:
Data Wrangler I/O Operations
This tutorial covers the I/O capabilities of data-wrangler, including loading and saving data from various sources and formats.
Overview
The datawrangler.io module provides seamless loading and saving of data from:
Local files: CSV, JSON, text files, images, and more
URLs: Load data directly from web sources
Multiple formats: Automatic format detection based on file extensions
Mixed sources: Handle lists of files/URLs with different formats
Let’s explore these capabilities with practical examples.
[ ]:
import datawrangler as dw
from datawrangler.io import load, save
import pandas as pd
import numpy as np
import os
from pathlib import Path
Loading Different File Formats
Data-wrangler automatically detects file formats and loads them appropriately. Let’s demonstrate with different file types:
[ ]:
# Create sample data files for demonstration
import tempfile
# Create a temporary directory for our examples
temp_dir = tempfile.mkdtemp()
print(f\"Working in temporary directory: {temp_dir}\"
# Create sample CSV file
csv_data = pd.DataFrame({
'product': ['laptop', 'mouse', 'keyboard', 'monitor'],
'price': [999.99, 25.50, 75.00, 300.00],
'category': ['electronics', 'accessories', 'accessories', 'electronics']
})
csv_file = os.path.join(temp_dir, 'products.csv')
csv_data.to_csv(csv_file, index=False)
# Create sample text file
text_content = \"\"\"Data science is transforming industries worldwide.
Machine learning enables computers to learn from data.
Natural language processing helps computers understand human language.
Data visualization makes complex data insights accessible.\"\"\"
text_file = os.path.join(temp_dir, 'sample_text.txt')
with open(text_file, 'w') as f:
f.write(text_content)
# Create sample JSON file
json_data = {
'users': [
{'name': 'Alice', 'age': 30, 'city': 'New York'},
{'name': 'Bob', 'age': 25, 'city': 'San Francisco'},
{'name': 'Charlie', 'age': 35, 'city': 'Chicago'}
]
}
json_file = os.path.join(temp_dir, 'users.json')
import json
with open(json_file, 'w') as f:
json.dump(json_data, f)
print(f\"Created files:\"
print(f\"- CSV: {csv_file}\"
print(f\"- Text: {text_file}\"
print(f\"- JSON: {json_file}\")