Skip to main content

Remote Archiver: safely collect output files into archives on network filesystem

Project description

ReAr

PyPI version

Remote Archiver: safely collect output files into archives on network filesystem

Replacement of open() for scenario where multiple processes generate lots of (log) files on a network filesystem. ReAr redirects the writes to Zip files to reduce the stress on the filesystem and to keep things organized. Writing to archive is chunked and staged to avoid single point of failure.

# On each worker:
async with rear_fs("/path/to/archive_base"):
    with rear_open("ar.zip/relpath/to/file", 'w+b') as f: # open a read-write buffer ...
    #with rear_pickup("/path/to/temp-file", "ar.zip/relpath/to/file"): # ... or pick up a file created by others
        f.write(b"...")
    # The file is written to a tmp archive on closing.
    # It will then be moved and eventually stored as `relpath/to/file` in zip file `/path/to/archive_base/ar.zip`.

To avoid concurrent write, each worker writes to a temporary Zip file, and they create a new one every 5 minutes. Run a scavenger to collect the files in the temporary archives into the final archives:

# On your main process:
async with scavengerd("/path/to/archive_base"):
    ...
# ... or to do it manually
while :; do
    rear-scavenger -d /path/to/archive_base
    sleep 5m
done

FAQ

What happens if a worker instance crashes?

Its current temporary archive will end up missing the central directory list as it is not properly closed. Scavenger will try to recover the files as much as possible (with zip -FF).

How does the scavenger works?

Multiple processes cannot write to one Zip file at the same time, so each first deposit the files to individual temporary Zip files and record where those files should be saved eventually. When a temporary Zip file is closed (after the process exit or after 5 minutes), Scavenger copies all files to their destination Zip files. Scavenger does not need to watch for incoming files actively since it can organize them any time after they are saved to the temporary Zip files. It is also safe to run multiple Scavenger instances at any time: it will check if it is necessary before performing any action.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rear-0.1.2.tar.gz (6.4 kB view hashes)

Uploaded Source

Built Distribution

rear-0.1.2-py3-none-any.whl (7.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page