datawrangler.io#

datawrangler.io.load(x, dtype=None, **kwargs)[source]#

Load local or remote files in a wide range of formats

Parameters
  • x – a string containing a URL or file path

  • dtype

    Optional argument for specifying how the data should be loaded; can be one of: - ‘pickle’: use the dill library to load in pickled objects and functions - ‘numpy’: treat the dataset as a .npy or .npz file - None (default): attempt to determine the filetype automatically based on the URL or file extension. The

    following filetypes are supported:

  • kwargs – any additional keyword arguments are passed to whatever function is selected to load in the dataset. For example, when loading in a csv file (a Pandas-compatible format), passing the keyword argument index_col=0 will tell Pandas to interpret the first (0) column as the resulting DataFrame’s index when loading the file’s contents into a DataFrame.

Returns

the retrieved data. Remote files will be cached (saved) locally to disk for faster loading if/when the

same address is used to load the file again at a later time.

datawrangler.io.save(x, obj, dtype=None, **kwargs)[source]#

Save data to disk.

Parameters
  • x – the file’s original path or URL (used to create a hash to define a new filename)

  • obj – the data to store to disk

  • dtype

    optional argument specifying how to store the data; can be one of: - ‘pickle’: use the dill library to pickle the object - ‘numpy’: save the objects as a compressed (.npz-formatted) numpy file - None (default): determine the filetype automatically; if x is passed in as bytes, write x directly to disk. If

    x is a string, treat x as text.

  • kwargs – any additional keyword arguments are passed to dill.dump (if dtype == ‘pickle’) or numpy.savez (if dtype == ‘numpy’). For any other datatype, additional keyword arguments are ignored.

Returns

None