A consistent approach to file operations, anywhere.
Project description
Cabinets
cabinets
is a Python library that provides a consistent interface for file operations
across multiple storage platforms. File extensions are dynamically detected to allow
automatic serialization and deserialization of Python objects.
cabinets
supports a variety of protocols and file
format parsers natively, and new protocols or parsers can be
easily registered.
Sample Usage
Read a file
Set up a test file in your local filesystem:
import json
obj = {'test': 1}
with open('data.json', 'w') as fh:
json.dump(obj, fh)
Read back and parse the file using cabinets
:
import cabinets
new_obj = cabinets.read('test.json')
That's it! The file is loaded and parsed in just one line.
Write a file
cabinets
also supports creating files. We can rewrite the first example using
only cabinets
.
import cabinets
obj = {'test': 1}
cabinets.create('test.json', obj)
new_obj = cabinets.read('test.json')
assert new_obj == obj
Reading and Writing with Other Protocols
Using cabinets
allows you to interact with multiple file storage protocols
depending on the URI you specify. In the previous examples, we used
read()
and write()
to operate within our local file system; that's
because cabinets
assumes we're using the file://
protocol by default.
Luckily, accessing other storage systems is just as easy!
For example, operating on a file on AWS S3 is done exactly the same way:
import cabinets
# Read JSON file from your filesystem
local_obj = cabinets.read('file://test.json')
# Write that object to a file in AWS S3
cabinets.create('s3://test.json', local_obj)
# Read back the same file from AWS S3
remote_obj = cabinets.read('s3://test.json')
assert local_obj == remote_obj
The above example will read a file from the local filesystem and create a new file containing the same data, at the same path in S3.
By prefixing the path with {protocol}://
we specify how and where cabinets
should
look for a file. Using file://
(default if none specified) tells cabinets
to use *
path* on the local filesystem. Using s3://
on the other hand instructs cabinets
to
perform operations against that path in AWS S3.
NOTE: The
S3Cabinet
may require initial configuration for thes3
protocol to function properly. See Protocol Configuration for details.
See all the natively supported protocols below.
Built-in Protocols and Parsers
Protocols
- Local File System (
file://
) - S3 (
s3://
)
Parsers
- YAML (
.yml
,.yaml
) - JSON (
.json
) - Python Pickle (
.pickle
) - CSV (beta) (
.csv
)
import cabinets
# .foo file in local filesystem
local_foo_data = cabinets.read('file://test.foo')
# .foo file in S3
s3_foo_data = cabinets.read('s3://test.foo')
Protocol Configuration
Some storage platform protocols may require additional configuration parameters to be set
before they can be used. Each Cabinet
subclass can expose a set_configuration(**config)
class method to take care of any required initial setup.
from cabinets.cabinet.s3 import S3Cabinet
# set the AWS S3 region to us-west-2 and specify an access key
S3Cabinet.set_configuration(region_name='us-west-2', aws_access_key_id=...)
# use specific Cabinet to avoid protocol prefix
S3Cabinet.read('bucket-in-us-west-2/test.json')
# or use generic Cabinet with protocol prefix
import cabinets
cabinets.read('s3://bucket-us-west-2/test.json')
See the documentation of specific Cabinet
classes for what configuration parameters
are available.
Custom Protocols and Parsers
cabinets
is designed to allow complete extensibility in adding new protocols and
parsers. Just because your desired storage platform or file format is not listed above,
doesn't mean you can't use it with cabinets
!
Adding Cabinets
New protocol connections can be added by subclassing abstract base class Cabinet
, and
registering the class to one or more protocol identifiers:
from cabinets import Cabinet, register_protocols
@register_protocols('foo')
class FooCabinet(Cabinet):
@classmethod
def set_configuration(cls, **kwargs):
# Set up any necessary configuration parameters for "foo" protocol
...
@classmethod
def _read_content(cls, path: str) -> bytes:
# Custom logic for reading bytes from a path using "foo" protocol
...
@classmethod
def _create_content(cls, path: str, content: bytes):
# Custom logic for writing bytes to a path using "foo" protocol
...
@classmethod
def _delete_content(cls, path):
# Custom logic for deleting the object at a path using "foo" protocol
...
Here we define a FooCabinet
, and register it to the protocol identifier foo
. Once
this class is loaded, any cabinets
function calls using the foo://
prefix will be
processed with this class. This means if we called:
import cabinets
from ... import FooCabinet # ensure FooCabinet is loaded
cabinets.read('foo://example.json')
The first call that occurs will be FooCabinet._read_content('foo.json)
, and that
result is then parsed by the JSONParser
before being returned.
NOTE: In order for the protocols to be registered, the class definition must be run at least once. Make sure the modules where your custom
Cabinet
classes are defined are imported somewhere before they are used, OR use the built in Plugin system.
Adding Parsers
cabinets
also supports custom extension parsing in the exact same way:
from cabinets.parser import Parser, register_extensions
@register_extensions('bar')
class BarParser(Parser):
@classmethod
def load_content(cls, content: bytes):
# Parse bytes from "bar" file format into a Python object
...
@classmethod
def dump_content(cls, data: Any):
# Dump a Python object into bytes in the "bar" file format
...
Now if we redo our above example using the .bar
extension:
from ... import FooCabinet, BarParser # ensure FooCabinet and BarParser are loaded
cabinets.read('foo://example.bar')
This statement is roughly equivalent to:
BarParser.load_content(FooCabinet.read_content('foo.bar'))
and should return a Python object from your Foo
cabinet, using your Bar
parser!
Loading Plugins
As mentioned in the example above, your custom Cabinet
and Parser
classes must be
executed in order to be added to the internal cache cabinets
uses for protocol and
extension lookup. If your custom classes are imported before any cabinets
functions are
use then, this won't be an issue. However, in many use cases there is no reason to
import those classes aside from usage with cabinets
functions. Instead of requiring
each class to be imported manually at the start of your program,
cabinets
can search a specified path for new Cabinet
and Parser
classes, and load
them automatically.
Specifying the PLUGIN_PATH
environment variable will cause cabinets
to search
for subdirectories called cabinet
and parser
in that path. Modules residing
within those directories will be searched for Cabinet
and Parser
subclasses
respectively.
└─ PLUGIN_PATH
|
└───cabinet
│ │ foo_cabinet.py
└───parser
│ │ bar_parser.py
│ │ baz_parser.py
If the above FooCabinet
and BarParser
classes are placed in foo_cabinet.py
and bar_parser.py
, they will be loaded and registered to their specified
cache without needing to be referenced anywhere else in the program.
Contributing
This package is open source (see LICENSE), so please feel free to contribute by submitting a pull request, creating an issue, or contacting the authors directly.
Authors and Contributors:
- Lucas Lofaro (Co-Author): lucasmlofaro@gmail.com
- Sam Hollenbach (Co-Author): samhollenbach@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.