Skip to main content

Pure Python implementation of the XZ file format with random access support

Project description

python-xz

Pure Python implementation of the XZ file format with random access support

GitHub build status Release on PyPI Code coverage MIT License


📖 Documentation   |   📃 Changelog


A XZ file can be composed of several streams and blocks. This allows for random access when reading, but this is not supported by Python's builtin lzma module, which would read all previous blocks for nothing.

lzma lzmaffi python-xz
module type builtin cffi (C extension) pure Python
📄 read
random access ❌ no1 ✔️ yes2 ✔️ yes2
several blocks ✔️ yes ✔️✔️ yes3 ✔️✔️ yes3
several streams ✔️ yes ✔️ yes ✔️✔️ yes4
stream padding ❌ no ✔️ yes ✔️ yes
📝 write
w mode ✔️ yes ✔️ yes ⏳ planned
x mode ✔️ yes ❌ no ⏳ planned
a mode ✔️ new stream ✔️ new stream ⏳ planned
r+w mode ❌ no ❌ no ⏳ planned
several blocks ❌ no ❌ no ⏳ planned
several streams ❌ no5 ❌ no5 ⏳ planned
stream padding ❌ no6 ✔️ yes ⏳ planned
  1. Reading from a position will read the file from the very beginning
  2. Reading from a position will read the file from the beginning of the block
  3. Block positions available with the block_boundaries attribute
  4. Stream positions available with the stream_boundaries attribute
  5. Possible by manually closing and re-opening in append mode
  6. Related issue

Usage

Read mode

The API is similar to lzma: you can use either xz.open or xz.XZFile.

>>> with xz.open('example.xz') as fin:
...     fin.read(18)
...     fin.stream_boundaries  # 2 streams
...     fin.block_boundaries   # 4 blocks in first stream, 2 blocks in second stream
...     fin.seek(1000)
...     fin.read(31)
...
b'Hello, world! \xf0\x9f\x91\x8b'
[0, 2000]
[0, 500, 1000, 1500, 2000, 3000]
1000
b'\xe2\x9c\xa8 Random access is fast! \xf0\x9f\x9a\x80'

Opening in text mode works as well, but notice that seek arguments as well as boundaries are still in bytes (just like with lzma.open).

>>> with xz.open('example.xz', 'rt') as fin:
...     fin.read(15)
...     fin.stream_boundaries
...     fin.block_boundaries
...     fin.seek(1000)
...     fin.read(26)
...
'Hello, world! 👋'
[0, 2000]
[0, 500, 1000, 1500, 2000, 3000]
1000
'✨ Random access is fast! 🚀'

Write mode

This mode is not available yet.


FAQ

How does random-access works?

XZ files are made of a number of streams, and each stream is composed of a number of block. This can be seen with xz --list:

$ xz --list file.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1      13     16.8 MiB    297.9 MiB  0.056  CRC64   file.xz

To read data from the middle of the 10th block, we will decompress the 10th block from its start it until we reach the middle (and drop that decompressed data), then returned the decompressed data from that point.

Choosing the good block size is a tradeoff between seeking time during random access and compression ratio.

How can I create XZ files optimized for random-access?

XZ Utils can create XZ files with several blocks:

$ xz -T0 file                          # threading mode
$ xz --block-size 16M file             # same size for all blocks
$ xz --block-list 16M,32M,8M,42M file  # specific size for each block

PIXZ creates files with several blocks by default:

$ pixz file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-xz-0.1.2.tar.gz (50.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_xz-0.1.2-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file python-xz-0.1.2.tar.gz.

File metadata

  • Download URL: python-xz-0.1.2.tar.gz
  • Upload date:
  • Size: 50.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for python-xz-0.1.2.tar.gz
Algorithm Hash digest
SHA256 1f990fa3b9adf6052b3869ffafd77df770b878bcfd497bcf4c41a1ebd045e4ef
MD5 91eb60ff1007bb402a1f9e673e33195d
BLAKE2b-256 6756abd0ba3270d8fb084c6ccf66cfc3e0cbf41338870fa9872f4d1eff68fc56

See more details on using hashes here.

File details

Details for the file python_xz-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: python_xz-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for python_xz-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c35e50f6180a94a8183693d1db95976707cec26c212a27d71c83d07d87a3df1f
MD5 3d76b4746dc8c967d2b4120e904040ad
BLAKE2b-256 6a62b2a68e03fd9f722019b916dd11d2841125ba8c87fa101f8a4bd3fba249fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page