Skip to main content

Utils for streaming large files (S3, HDFS, gzip, bz2...)

Project description

Travis Downloads License

What?

smart_open is a Python library for efficient streaming of (very large) files from/to S3. It is well tested (using moto), well documented and has a dead simple API:

FIXME EXAMPES

Why?

Amazon’s standard Python library, boto contains all the necessary building blocks for streaming, but has a really clumsy interface. There are nasty hidden gotchas when you want to stream large files from/to S3 (as opposed to simple in-memory read/write with key.set_contents_from_string() and key.get_contents_as_string()).

smart_open shields you from that, offering a cleaner API. The result is less code for you to write and fewer bugs to make.

Installation

The module has no dependencies beyond 2.6 <= Python < 3.0 and boto:

pip install smart_open

Or, if you prefer to install from the source tar.gz

python setup.py test # run unit tests
python setup.py install

To run the unit tests (optional), you’ll also need to install mock and moto.

Todo

  • improve smart_open support for HDFS (streaming from/to Hadoop File System)

  • migrate smart_open streaming of gzip/bz2 files from gensim

  • better document support for the default file:// scheme

  • add py3k support

Documentation

FIXME TODO help()

Comments, bug reports

smart_open lives on github. You can file issues or pull requests there.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_open-0.1.1.tar.gz (11.2 kB view details)

Uploaded Source

File details

Details for the file smart_open-0.1.1.tar.gz.

File metadata

  • Download URL: smart_open-0.1.1.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for smart_open-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6992b41b8d8e5108944e5a8452f3dbc2a2660b574d79f0963acc91edf359e63e
MD5 4d908a9e2fb02f488b53c4006c0b8d18
BLAKE2b-256 affb490c361d1b5244fb78f3d2bcebe965b66c6cf706b38176e84fc9e5a87370

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page