Simplest possible content-addressable file store for blobs.

These details have not been verified by PyPI

Project links

Project description

Simplest Possible Content-Addressable Blob Store

GrugStore Logo

This is a simple content-addressable blob store. It stores blobs of data and associated metadata. The blobs are stored in a directory hierarchy based on the base58 encoding of their SHA-256 hash. Metadata is stored as siblings to the blob file.

Quick Start

from grugstore import GrugStore

# Create a GrugStore instance
gs = GrugStore('some-dir', hierarchy_depth=3)

# Store a blob
hash_str, file_path = gs.store(b'Hello, World!')

# Check if a blob exists
if gs.exists(hash_str):
    # Load the blob
    blob = gs.load_bytes(hash_str)

Core Methods

Store Metadata

# Set a README for the store
gs.set_readme("This store contains user avatars and profile images")

# Get the README content
readme_content = gs.get_readme()

Storing and Loading Data

# Store raw bytes - returns (hash_string, file_path)
hash_str, file_path = gs.store(b'Hello, World!')

# Stream from a file-like object (e.g., for large files)
with open('large_file.bin', 'rb') as f:
    hash_str = gs.stream(f)

# Load data back
data = gs.load_bytes(hash_str)

# Read data using context manager (for streaming large files)
with gs.read(hash_str) as f:
    content = f.read()  # or read in chunks

# Write data using context manager with automatic hashing
with gs.write() as (f, get_hash):
    f.write(b'Hello, World!')
    f.write(b' More data...')
# After the context exits, get the hash
hash_str = get_hash()

Working with Sibling Files

# Store metadata/sibling files
gs.store_sibling(hash_str, 'json', b'{"key": "value"}')
gs.store_sibling(hash_str, 'txt', b'Additional notes')

# Load sibling data
metadata = gs.load_sibling_bytes(hash_str, 'json')
notes = gs.load_sibling_bytes(hash_str, 'txt')

Checking Existence

# Check if main blob exists
if gs.exists(hash_str):
    print("Blob exists!")

# Check if sibling file exists
if gs.exists(hash_str, 'json'):
    metadata = gs.load_sibling_bytes(hash_str, 'json')

Path Operations

# Get path to a blob (without loading it)
blob_path = gs.path_to(hash_str)

# Get path to a sibling file
metadata_path = gs.path_to(hash_str, 'json')

Copying and Moving Files

# Copy an external file into the store
# Returns (hash_string, file_path) - original file remains unchanged
hash_str, store_path = gs.copy_file('/path/to/source/file.pdf')

# Move an external file into the store
# Returns (hash_string, file_path) - original file is deleted
hash_str, store_path = gs.move_file('/path/to/source/file.pdf')

# Both methods:
# - Calculate the file's SHA-256 hash efficiently
# - Create the appropriate directory structure
# - Handle duplicates (won't overwrite existing files)
# - Support both string and Path objects as input

Iteration and Validation

# Iterate over all blobs (excluding siblings)
for hash_str, file_path in gs.iter_files(no_sibling=True):
    print(f"Found blob: {hash_str}")

# Iterate with sibling information
for hash_str, file_path, sibling_extensions in gs.iter_files():
    print(f"Blob: {hash_str}")
    print(f"Siblings: {sibling_extensions}")  # e.g., {'json', 'txt'}

# Validate integrity of all blobs
for invalid_path in gs.validate_tree():
    print(f"Corrupted file: {invalid_path}")

# Auto-delete corrupted files
for invalid_path in gs.validate_tree(auto_delete=True):
    print(f"Deleted corrupted file: {invalid_path}")

# Auto-delete corrupted files and their siblings
for invalid_path in gs.validate_tree(auto_delete=True, delete_siblings=True):
    print(f"Deleted corrupted file: {invalid_path}")

Filtering and Copying

# Create a filtered copy of the store
def size_filter(hash_str, file_path):
    # Only copy files smaller than 1MB
    return file_path.stat().st_size < 1024 * 1024

# Create a new store with only small files
filtered_gs = gs.filtered_copy('filtered-dir', size_filter)

# The filtered store contains the same hierarchy depth and README
print(f"Hierarchy depth: {filtered_gs.hierarchy_depth}")
print(f"README: {filtered_gs.get_readme()}")

# Example: Copy only specific file types based on sibling extensions
def has_json_metadata(hash_str, file_path):
    # Check if this blob has a JSON sibling
    return gs.exists(hash_str, 'json')

json_only_gs = gs.filtered_copy('json-only-dir', has_json_metadata)

# Example: Copy files matching certain hash patterns
def hash_prefix_filter(hash_str, file_path):
    # Only copy files whose hash starts with 'Q'
    return hash_str.startswith('Q')

q_files_gs = gs.filtered_copy('q-files-dir', hash_prefix_filter)

String Representations

# Get a human-readable string representation
print(gs)  # Output: GrugStore(/path/to/store)

# Get a detailed representation (useful for debugging)
print(repr(gs))  # Output: GrugStore(base_dir=PosixPath('/path/to/store'), hierarchy_depth=3)

File Layout

GrugStore organizes files in a hierarchical directory structure based on the base58-encoded SHA-256 hash of the content. Here's an example of what a GrugStore directory looks like with hierarchy_depth=2:

some-dir/
├── _meta/
│   └── README          # Optional store-level documentation
├── _tmp/                  # Temporary directory for atomic file operations
├── 2/
│   └── X/
│       ├── 2XaBcD...xyz  # The actual blob file (no extension)
│       └── 2XaBcD...xyz.json  # Sibling metadata file
├── 5/
│   └── K/
│       ├── 5Kj9Yz...abc  # Another blob
│       ├── 5Kj9Yz...abc.json  # JSON sibling
│       └── 5Kj9Yz...abc.txt   # Text sibling
└── 8/
    └── R/
        └── 8Rm4Qp...def  # Blob without any sibling files

Directory Structure Details

Hash-based hierarchy: Files are organized using prefixes of their base58-encoded hash. With hierarchy_depth=2, the first character becomes the first directory level, the second character becomes the second level.
Blob files: The main content files have no extension and are named with their full hash.
Sibling files: Related metadata or additional content files share the same hash name but include an extension (e.g., .json, .txt).
_meta/ directory: Contains store-level metadata like README files.
_tmp/ directory: Used internally for atomic file operations. Files are first written here and then moved to their final location to ensure write atomicity and prevent partial file corruption.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3

Jul 24, 2025

0.1.2

Jul 24, 2025

0.1.1

Jul 24, 2025

0.1.0

Jul 23, 2025

0.0.2

Apr 26, 2024

0.0.1

Apr 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grugstore-0.1.3.tar.gz (475.8 kB view details)

Uploaded Jul 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

grugstore-0.1.3-py3-none-any.whl (9.7 kB view details)

Uploaded Jul 24, 2025 Python 3

File details

Details for the file grugstore-0.1.3.tar.gz.

File metadata

Download URL: grugstore-0.1.3.tar.gz
Upload date: Jul 24, 2025
Size: 475.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.19

File hashes

Hashes for grugstore-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`885c533662fbd73be6b710adb53f1a8dbaaede4682fc3fedc1dd85eba1599b84`
MD5	`c9fc8ba62d248973d04c66aaecd87f86`
BLAKE2b-256	`152b731dd6efa804b3204a45ea78bf82a2c81890ade0a247a6e5224818afc16a`

See more details on using hashes here.

File details

Details for the file grugstore-0.1.3-py3-none-any.whl.

File metadata

Download URL: grugstore-0.1.3-py3-none-any.whl
Upload date: Jul 24, 2025
Size: 9.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.19

File hashes

Hashes for grugstore-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f292193404430ba7f6e64e5cd0a40021dc999fe9cb5b5b68e77d3343cf60c01`
MD5	`890e7d29fd8385620b81441f020790b4`
BLAKE2b-256	`c0889309891e1a9d5b35b4be940cca8f884986053932b892585f2302f87e50e8`

See more details on using hashes here.

grugstore 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Simplest Possible Content-Addressable Blob Store

Quick Start

Core Methods

Store Metadata

Storing and Loading Data

Working with Sibling Files

Checking Existence

Path Operations

Copying and Moving Files

Iteration and Validation

Filtering and Copying

String Representations

File Layout

Directory Structure Details

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes