Skip to main content

Objective Oriented Interface for AWS S3, similar to pathlib.

Project description

Documentation Status https://github.com/MacHu-GWU/s3pathlib-project/workflows/CI/badge.svg https://codecov.io/gh/MacHu-GWU/s3pathlib-project/branch/main/graph/badge.svg https://img.shields.io/pypi/v/s3pathlib.svg https://img.shields.io/pypi/l/s3pathlib.svg https://img.shields.io/pypi/pyversions/s3pathlib.svg https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social
https://img.shields.io/badge/Link-Document-orange.svg https://img.shields.io/badge/Link-API-blue.svg https://img.shields.io/badge/Link-Source_Code-blue.svg https://img.shields.io/badge/Link-Submit_Issue-blue.svg https://img.shields.io/badge/Link-Request_Feature-blue.svg https://img.shields.io/badge/Link-Download-blue.svg

Welcome to s3pathlib Documentation

s3pathlib is the python package provides the Pythonic objective oriented programming (OOP) interface to manipulate AWS S3 object / directory. The api is similar to the pathlib standard library and very intuitive for human.

Quick Start

Import the library, declare a S3 object

# import
>>> from s3pathlib import S3Path

# construct from string, auto join parts
>>> p = S3Path("bucket", "folder", "file.txt")
>>> p.bucket
'bucket'
>>> p.key
'folder/file.txt'
>>> p.uri
's3://bucket/folder/file.txt'
>>> p.console_url # click to preview it in AWS console
'https://s3.console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt'
>>> p.arn
'arn:aws:s3:::bucket/folder/file.txt'

Talk to AWS S3 and get some information

# s3pathlib maintains a "context" object that holds the AWS authentication information
# you just need to build your own boto session object and attach to it
>>> import boto3
>>> from s3pathlib import context
>>> context.attach_boto_session(
...     boto3.session.Session(
...         region_name="us-east-1",
...         profile_name="my_aws_profile",
...     )
... )

>>> p = S3Path("bucket", "folder", "file.txt")
>>> p.etag
'3e20b77868d1a39a587e280b99cec4a8'
>>> p.size
56789000
>>> p.size_for_human
'51.16 MB'

# folder works too, you just need to use a tailing "/" to identify that
>>> p = S3Path("bucket", "datalake/")
>>> p.count_objects()
7164 # number of files under this prefix
>>> p.calculate_total_size()
(7164, 236483701963) # 7164 objects, 220.24 GB
>>> p.calculate_total_size(for_human=True)
(7164, '220.24 GB') # 7164 objects, 220.24 GB

Manipulate Folder in S3

Native S3 Write API (those operation that change the state of S3) only operate on object level. And the list_objects API returns 1000 objects at a time. You need additional effort to manipulate objects recursively. s3pathlib CAN SAVE YOUR LIFE

# create a S3 folder
>>> p = S3Path("bucket", "github", "repos", "my-repo/")

# upload all python file from /my-github-repo to s3://bucket/github/repos/my-repo/
>>> p.upload_dir("/my-repo", pattern="**/*.py", overwrite=False)

# copy entire s3 folder to another s3 folder
>>> p2 = S3Path("bucket", "github", "repos", "another-repo/")
>>> p1.copy_to(p2, overwrite=True)

# delete all objects in the folder, recursively, to clean up your test bucket
>>> p.delete_if_exists()
>>> p2.delete_if_exists()

Getting Help

Please use the python-s3pathlib tag on Stack Overflow to get help.

Submit a I want help issue tickets on GitHub Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3pathlib-1.0.2.tar.gz (48.8 kB view hashes)

Uploaded Source

Built Distribution

s3pathlib-1.0.2-py2.py3-none-any.whl (50.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page