datawrangler.io

datawrangler.io.load(x, dtype=None, **kwargs)[source]

Load local or remote files in a wide range of formats

Parameters

param x:

a string containing a URL or file path

param dtype:

Optional argument for specifying how the data should be loaded; can be one of:

‘pickle’: use the dill library to load in pickled objects and functions
‘numpy’: treat the dataset as a .npy or .npz file
None (default): attempt to determine the filetype automatically based on the URL or file extension. The following filetypes are supported:
- txt files: treated as plain text
- any filetype supported by the Pandas library: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
- any image filetype supported by PIL; for a full list see: https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html

param kwargs:

any additional keyword arguments are passed to whatever function is selected to load in the dataset. For example, when loading in a csv file (a Pandas-compatible format), passing the keyword argument index_col=0 will tell Pandas to interpret the first (0) column as the resulting DataFrame’s index when loading the file’s contents into a DataFrame.

Returns

return:: the retrieved data. Remote files will be cached (saved) locally to disk for faster loading if/when the same address is used to load the file again at a later time.

datawrangler.io.save(x, obj, dtype=None, **kwargs)[source]

Save data to disk.

Parameters

param x:

the file’s original path or URL (used to create a hash to define a new filename)

param obj:

the data to store to disk

param dtype:

optional argument specifying how to store the data; can be one of:

‘pickle’: use the dill library to pickle the object
‘numpy’: save the objects as a compressed (.npz-formatted) numpy file
None (default): determine the filetype automatically; if x is passed in as bytes, write x directly to disk. If x is a string, treat x as text.

param kwargs:

any additional keyword arguments are passed to dill.dump (if dtype == ‘pickle’) or numpy.savez (if dtype == ‘numpy’). For any other datatype, additional keyword arguments are ignored.

Returns

return:: None