Utils for streaming large files (S3, HDFS, gzip, bz2...)
Project description
What?
smart_open is a Python library for efficient streaming of (very large) files from/to S3. It is well tested (using moto), well documented and has a dead simple API:
FIXME EXAMPES
Why?
Amazon’s standard Python library, boto contains all the necessary building blocks for streaming, but has a really clumsy interface. There are nasty hidden gotchas when you want to stream large files from/to S3 (as opposed to simple in-memory read/write with key.set_contents_from_string() and key.get_contents_as_string()).
smart_open shields you from that, offering a cleaner API. The result is less code for you to write and fewer bugs to make.
Installation
The module has no dependencies beyond 2.6 <= Python < 3.0 and boto:
pip install smart_open
Or, if you prefer to install from the source tar.gz
python setup.py test # run unit tests python setup.py install
To run the unit tests (optional), you’ll also need to install mock and moto.
Todo
improve smart_open support for HDFS (streaming from/to Hadoop File System)
migrate smart_open streaming of gzip/bz2 files from gensim
better document support for the default file:// scheme
add py3k support
Documentation
FIXME TODO help()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file smart_open-0.1.1.tar.gz.
File metadata
- Download URL: smart_open-0.1.1.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6992b41b8d8e5108944e5a8452f3dbc2a2660b574d79f0963acc91edf359e63e
|
|
| MD5 |
4d908a9e2fb02f488b53c4006c0b8d18
|
|
| BLAKE2b-256 |
affb490c361d1b5244fb78f3d2bcebe965b66c6cf706b38176e84fc9e5a87370
|
Comments, bug reports
smart_open lives on github. You can file issues or pull requests there.