datawrangler.io

datawrangler.io.load(x, dtype=None, **kwargs)[source]

Load local or remote files in a wide range of formats

Parameters

param x:

a string containing a URL or file path

param dtype:

Optional argument for specifying how the data should be loaded; can be one of: - ‘pickle’: use the dill library to load in pickled objects and functions - ‘numpy’: treat the dataset as a .npy or .npz file - None (default): attempt to determine the filetype automatically based on the URL or file extension. The

following filetypes are supported:
param kwargs:

any additional keyword arguments are passed to whatever function is selected to load in the dataset. For example, when loading in a csv file (a Pandas-compatible format), passing the keyword argument index_col=0 will tell Pandas to interpret the first (0) column as the resulting DataFrame’s index when loading the file’s contents into a DataFrame.

Returns

return:

the retrieved data. Remote files will be cached (saved) locally to disk for faster loading if/when the

same address is used to load the file again at a later time.

datawrangler.io.save(x, obj, dtype=None, **kwargs)[source]

Save data to disk.

Parameters

param x:

the file’s original path or URL (used to create a hash to define a new filename)

param obj:

the data to store to disk

param dtype:

optional argument specifying how to store the data; can be one of: - ‘pickle’: use the dill library to pickle the object - ‘numpy’: save the objects as a compressed (.npz-formatted) numpy file - None (default): determine the filetype automatically; if x is passed in as bytes, write x directly to disk. If

x is a string, treat x as text.

param kwargs:

any additional keyword arguments are passed to dill.dump (if dtype == ‘pickle’) or numpy.savez (if dtype == ‘numpy’). For any other datatype, additional keyword arguments are ignored.

Returns

return:

None