Skip to main content

Python function to construct a ZIP archive with stream processing - without having to store the entire ZIP in memory or disk

Project description

stream-zip CircleCI Test Coverage

Python function to construct a ZIP archive on the fly - without having to store the entire ZIP in memory or disk. This is useful in memory-constrained environments, or when you would like to start returning compressed data before you've even retrieved all the uncompressed data. Generating ZIPs on-demand in a web server is a typical use case for stream-zip.

Offers similar functionality to zipfly, but with a different API, and does not use Python's zipfile module under the hood.

To unZIP files on the fly try stream-unzip.

Installation

pip install stream-zip

Usage

from datetime import datetime
from stream_zip import ZIP_64, ZIP_32, NO_COMPRESSION_64, NO_COMPRESSION_32, stream_zip

def unzipped_files():
    modified_at = datetime.now()
    perms = 0o600

    def file_1_data():
        yield b'Some bytes 1'

    def file_2_data():
        yield b'Some bytes 2'

    def file_3_data():
        yield b'Some bytes 3'

    def file_4_data():
        yield b'Some bytes 4'

    # ZIP_64 mode
    yield 'my-file-1.txt', modified_at, perms, ZIP_64, file_1_data()

    # ZIP_32 mode
    yield 'my-file-2.txt', modified_at, perms, ZIP_32, file_2_data()

    # No compression for ZIP_32 files
    yield 'my-file-3.txt', modified_at, perms, NO_COMPRESSION_64, file_3_data()

    # No compression for ZIP_64 files
    yield 'my-file-4.txt', modified_at, perms, NO_COMPRESSION_32, file_4_data()

for zipped_chunk in stream_zip(unzipped_files()):
    print(zipped_chunk)

Limitations

It's not possible to completely stream-write ZIP files. Small bits of metadata for each member file, such as its name, must be placed at the end of the ZIP. In order to do this, stream-unzip buffers this metadata in memory until it can be output.

No compression is supported via the NO_COMPRESSION_* constants as in the above examples. However in these cases the entire contents of each are buffered in memory, and so should not be used for large files. This is because for uncompressed data, its size and CRC32 must be before it in the ZIP file.

It doesn't seem possible to automatically choose ZIP_64 based on file sizes if streaming, since the specification of ZIP_32 vs ZIP_64 must be before the compressed data of each file in the final stream, and so before the sizes are known. Hence the onus is on client code to choose. ZIP_32 has greater support but is limited to 4GiB (gibibyte), while ZIP_64 has less support, but has a much greater limit of 16EiB (exbibyte). These limits apply to the compressed size of each member file, the uncompressed size of each member file, and to the size of the entire archive.

Exception hierarchy

  • ZipError

    Base class for all explicitly-thrown exceptions

    • ZipValueError (also inherits from the ValueError built-in)

      Base class for errors relating to invalid arguments

      • ZipOverflowError (also inherits from the OverflowError built-in)

        The size or positions of data in the ZIP are too large to store in the requested mode

        • UncompressedSizeOverflowError

          The uncompressed size of a member file is too large. The maximum uncompressed size for ZIP_32 mode is 2^32 - 1 bytes, and for ZIP_64 mode is 2^64 - 1 bytes.

        • CompressedSizeOverflowError

          The compressed size of a member file is too large. The maximum compressed size for ZIP_32 mode is 2^32 - 1 bytes, and for ZIP_64 mode is 2^64 - 1 bytes.

        • CentralDirectorySizeOverflowError

          The size of the central directory, a section at the end of the ZIP that lists all the member files. The maximum size for ZIP_32 mode is 2^32 - 1 bytes, and for ZIP_64 mode is 2^64 - 1 bytes.

          If any _64 mode files are in the ZIP, the central directory is in ZIP_64 mode, and ZIP_32 mode otherwise.

        • CentralDirectoryNumberOfEntriesOverflowError

          Too many entries in the central directory, a section at the end of the ZIP that lists all the member files. The limit for ZIP_32 mode is 2^16 - 1 entries, and for ZIP_64 mode is 2^64 - 1 entries.

          If any _64 mode files are in the ZIP, the central directory is in ZIP_64 mode, and ZIP_32 mode otherwise.

        • OffsetOverflowError

          The offset of data in the ZIP is too high, i.e. the ZIP is too large. The limit for ZIP_32 mode is 2^32 - 1 bytes, and for ZIP_64 mode is 2^64 - 1 bytes.

          This can be raised when stream-zip adds member files, or when it adds the central directory at the end of the ZIP file. If any _64 mode files are in the ZIP, the central directory is in ZIP_64 mode, and ZIP_32 mode otherwise.

          It is possible for the ZIP file to be larger than the maximum allowed offset without this exception being thrown. For example in ZIP_32 mode the archive can can be larger than 2^32 - 1 bytes.

        • NameLengthOverflowError

          The length of a file name is too high. The limit is 2^16 - 1 bytes, and applied to file names after UTF-8 encoding.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stream-zip-0.0.40.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stream_zip-0.0.40-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file stream-zip-0.0.40.tar.gz.

File metadata

  • Download URL: stream-zip-0.0.40.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for stream-zip-0.0.40.tar.gz
Algorithm Hash digest
SHA256 c27a2009294b8e62a933612975329cb88f0ee006ade6a30ba086ef7d3b2c1e94
MD5 a7be80a83f797b0cec91a8e6a537c948
BLAKE2b-256 c22c24ff17e6f24bcd6872a17a2d9aad35d8f05e696e29434267d70d38b9b376

See more details on using hashes here.

File details

Details for the file stream_zip-0.0.40-py3-none-any.whl.

File metadata

  • Download URL: stream_zip-0.0.40-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for stream_zip-0.0.40-py3-none-any.whl
Algorithm Hash digest
SHA256 f067b9133b1831ab23e7c39d5d53f3c41cd569e912236613f2677ff8b6e51d94
MD5 042165899bc76c7a717dc4d267d5e782
BLAKE2b-256 b0fa4eea80dfee1a45e423ea96b6043a9f33d48bc4a9bc6d1efd308dba0a270f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page