Skip to main content

Continuously and asynchronously sync a local folder to an S3 bucket

Project description

mobius3 CircleCI Test Coverage

Continuously and asynchronously sync a local folder to an S3 bucket. This is a Python application, suitable for situations where

  • FUSE cannot be used, such as in AWS Fargate;
  • high performance local access is more important than synchronous saving to S3;
  • there can be frequent modifications to the same file monitored by a single client;
  • there are infrequent concurrent modifications to the same file from different clients;
  • local files can be changed by any program;
  • there are at most ~10k files to sync;
  • changes in the S3 bucket may be performed directly i.e. not using mobius3.

These properties make mobius3 similar to a Dropbox or Google Drive client. Under the hood, inotify is used and so only Linux is supported.

Work in progress. This README is a rough design spec.

Installation

pip install mobius3

Usage

mobius3 can be used a standalone command-line application

mobius3 /local/folder https://remote-bucket.s3-eu-west-2.amazonaws.com/ eu-west-2 --prefix folder/

or from Docker

docker run --rm -it \
    -v /local/folder:/home/mobius3/data \
    -e AWS_ACCESS_KEY_ID \
    -e AWS_SECRET_ACCESS_KEY \
    quay.io/uktrade/mobius3:v0.0.8 \
    mobius3 \
        /home/mobius3/data \
        https://remote-bucket.s3-eu-west-2.amazonaws.com/ \
        eu-west-2 \
        --prefix my-prefix/

or from asyncio Python

from mobius3 import Syncer

start, stop = Syncer('/local/folder', 'https://remote-bucket.s3-eu-west-2.amazonaws.com/', 'eu-west-2', prefix='folder/')

# Will copy the contents of the bucket to the local folder,
# raise exceptions on error, and then continue to sync in the background
await start()

# Will complete any remaining uploads
await stop()

In the cases above AWS credentials are taken from the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables. To use ECS-provided credentials / IAM Roles, you can pass --credentials-source ecs-container-endpoint as a command line option. In an ECS task definition, this would look something like the below

{
	"command": [
		"mobius3",
		"/home/mobius3/data",
		"https://remote-bucket.s3-eu-west-2.amazonaws.com/",
		"eu-west-2",
		"--prefix", "my-prefix/"
		"--credentials-source", "ecs-container-endpoint"
	]
}

If using mobius3 to sync data in a volume accessed by multiple containers, you may have to create your own Dockerfile that runs mobius3 under a user with the same ID as the users in the other containers.

Under the hood and limitations

Renaming files or folders map to no atomic operation in S3, and conflicts are dealt with where S3 is always the source-of-truth. This means that with concurrent modifications or deletions to the same file(s) or folder(s) by different clients data can be lost and the directory layout may get corrupted.

If a file has been updated or deleted locally, any concurrent changes from S3 are delayed for 60 seconds as a best-effort to avoid eventual consistency issues where S3 does not yet present a consistent view of latest changes.

A simple polling mechanism is used to check for changes in S3: hence for large number of files/objects mobius3 may not be performant.

Some of the above behaviours may change in future versions.

Concurrency: responding to concurrent file modifications

Mid-upload, a file can could modified by a local process, so in this case a corrupt file could be uploaded to S3. To mitigate this mobius3 uses the following algorithm for each upload.

  • An IN_CLOSE_WRITE event is received for a file, and we start the upload.
  • Just before the end of the upload, the final bytes of the file are read from disk.
  • A dummy "flush" file is written to the relevant directory.
  • Wait for the IN_CREATE event for this file. This ensures that any events since the final bytes were read have also been received.
  • If we received an IN_MODIFY event for the file, the file has been modified, and we do not upload the final bytes. Since IN_MODIFY was received, once the file is closed we will receive an IN_CLOSE_WRITE, and we re-upload the file. If not such event is received, we complete the upload.

An alternative to the above would be use a filesystem locking mechanism. However

  • other processes may not respect advisary locking;
  • the filesystem may not support mandatory locking;
  • we don't want to prevent other processes from progressing due to locking the file on upload: this would partially remove the benefits of the asynchronous nature of the syncing.

Concurrency: keeping HTTP requests for the same file ordered

Multiple concurrent requests to S3 are also supported. However, this presents the possibility of additional race conditions: requests started in a given order may not be received by S3 in that order. This means that newer versions of files can be overwritten by older. Even the guarantee from S3 that "latest time stamp wins" for concurrent PUTs to the same key does not offer sufficient protection from this, since such requests can be made with the same timestamp.

Therefore to prevent this, a FIFO lock is used around each file during PUT and DELETE of any key.

Running tests

docker-compose build && \
docker-compose run --rm test python3 setup.py test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mobius3-0.0.10.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mobius3-0.0.10-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file mobius3-0.0.10.tar.gz.

File metadata

  • Download URL: mobius3-0.0.10.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.22.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for mobius3-0.0.10.tar.gz
Algorithm Hash digest
SHA256 3132c351afcfeed41681ed8835e74309ab9c32e117ec94194541c80662980064
MD5 a0162b064d6ba7981193088724281f17
BLAKE2b-256 8f27b3531c38630598d6afeabf1bbdaedddd3a33f43cc84e563c75fdb34ecba6

See more details on using hashes here.

File details

Details for the file mobius3-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: mobius3-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.22.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for mobius3-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 d710df7f193019f30a0574ebda8277cf2b367c6b8c0108af3f19c3ba505d4f64
MD5 1224069004d05d48e1b6319fe850acd2
BLAKE2b-256 c8735ca34d9bfc1b642b8241d3322dcb82d19e013fa373b8c7bb10758526a27e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page