Skip to main content

Read, validate and print the contents of a publicly-accessible (open) S3 object, no AWS credentials required.

This project has been quarantined.

PyPI Admins need to review this project before it can be restored. While in quarantine, the project is not installable by clients, and cannot be being modified by its maintainers.

Read more in the project in quarantine help article.

Project description

dt-validator

Read, validate, and print the contents of a publicly-accessible (open) S3 object — no AWS credentials required. Uses boto3 with an unsigned (anonymous) signature.

Features

  • Anonymous reads of public S3 objects (s3://, virtual-hosted, and path-style URLs).
  • A composable file-validation layer: extension, content-type, size bounds, non-empty, text-encoding, and checksum checks.
  • Cheap pre-flight validation via a HEAD request before downloading the body.
  • Typed exception hierarchy, opt-in logging, py.typed for type-checkers.
  • Library API and a CLI, both fully tested (pytest + moto).

Install

pip install -e .            # runtime
pip install -e ".[dev]"     # + pytest, moto, coverage

CLI

# Print an object
dt-validator s3://my-open-bucket/path/to/file.txt

# Metadata only (HEAD, no download)
dt-validator s3://my-open-bucket/file.txt --head

# Read only the first 1 KB
dt-validator s3://my-open-bucket/big.log --max-bytes 1024

# Raw bytes to stdout
dt-validator s3://my-open-bucket/logo.png --binary > logo.png

# With validation — fails (non-zero exit) if any constraint is violated
dt-validator s3://my-open-bucket/data.csv \
    --ext csv --content-type text/csv \
    --max-size 1048576 --non-empty \
    --require-encoding utf-8 \
    --checksum sha256:9f86d0818...

Exit codes: 0 ok · 1 S3/network error · 2 bad usage · 3 validation failed · 4 not found · 5 access denied.

Library

from dt_validator import read_object, ValidationPolicy

# Simple read (str by default; encoding=None -> bytes)
text = read_object("s3://my-open-bucket/notes.txt")

# Read with a validation policy
policy = ValidationPolicy(
    allowed_extensions=[".csv"],
    allowed_content_types=["text/csv", "text/plain"],
    max_bytes=5 * 1024 * 1024,
    require_non_empty=True,
    expected_encoding="utf-8",
    checksum_algorithm="sha256",
    expected_checksum="9f86d0818...",
)
data = read_object("s3://my-open-bucket/data.csv", policy=policy)

Reading a file whose URL comes from an API

The API returns a file URL, and the package then reads that file itself. The endpoint's response is the indirection — you configure the file location there instead of hard-coding it in your app.

Flow: call the API → extract the file URL from its response → read that file (s3:// or http(s)://) → return its contents.

from dt_validator import read_file_from_api, read_url

# 1) call the API  2) read the file URL from its response  3) return that file's content
text = read_file_from_api()   # endpoint defaults to https://file-read.free.beeceptor.com

# Custom endpoint / JSON field / validation policy
text = read_file_from_api(
    "https://my-api.example.com/current-file",
    url_field="url",            # JSON field holding the file URL (default: "url")
    method="GET",              # or "POST"
    policy=ValidationPolicy(max_bytes=1_000_000, expected_encoding="utf-8"),
)

# Or read a file URL you already have (s3:// or http(s)://)
data = read_url("s3://my-open-bucket/notes.txt")

CLI:

dt-validator --via-api
dt-validator --via-api \
    --api-endpoint https://my-api.example.com/current-file \
    --api-method GET --api-url-field url

Configure your endpoint to return the file URL, e.g. response body:

{"url": "s3://my-open-bucket/notes.txt"}

(a bare URL as plain text works too).

Note: the default endpoint is a Beeceptor mock — until you add a rule that returns a file URL, it replies with placeholder text and the package will report that no file URL was found.

Cheap metadata check without downloading:

from dt_validator import head_object

meta = head_object("s3://my-open-bucket/data.csv")
print(meta.size, meta.content_type, meta.etag)

Standalone validators (each raises a specific ValidationError subclass):

from dt_validator import (
    validate_extension, validate_content_type, validate_size,
    validate_not_empty, validate_encoding, validate_checksum,
)

validate_extension("data.csv", ["csv", ".tsv"])
validate_size(len(data), max_bytes=1_000_000)
validate_checksum(data, "sha256", expected_hex)

Package layout

src/dt_validator/
  __init__.py        public API
  reader.py          parsing, anonymous read, HEAD, error mapping
  validation.py      validators + ValidationPolicy
  exceptions.py      typed exception hierarchy
  _logging.py        NullHandler + opt-in configure_logging()
  cli.py             argparse CLI
tests/
  test_reader.py               URI parsing
  test_validation.py           validators + policy
  test_reader_integration.py   reader against moto S3
  test_cli.py                  CLI against moto S3

Exceptions

All derive from FileValidatorError:

  • InvalidUriError (also a ValueError)
  • ObjectNotFoundError, AccessDeniedError, RemoteReadError
  • ValidationErrorFileSizeError, ExtensionError, ContentTypeError, EncodingError, ChecksumError

Note on "open" access

The object (or bucket) must allow anonymous s3:GetObject. This tool intentionally sends unsigned requests, so private objects return 403 AccessDenied.

Tests

pytest              # 46 tests
pytest --cov        # with coverage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dt_validator-0.3.0.tar.gz (25.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dt_validator-0.3.0-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file dt_validator-0.3.0.tar.gz.

File metadata

  • Download URL: dt_validator-0.3.0.tar.gz
  • Upload date:
  • Size: 25.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for dt_validator-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6aba0ac08b81f56de7a7d80e7ace561c8f897fc573a61308d70491e627d65fbc
MD5 331149c7af77583799066657bb93d74c
BLAKE2b-256 f543d16dacb2a6c758310456c3a9c61e886a48384842895e392bec2f00084c32

See more details on using hashes here.

File details

Details for the file dt_validator-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: dt_validator-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for dt_validator-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a6b9fa23d300bd6133dd4c7ecfa05893ed68dc108cdd6726c0c256441eba5f06
MD5 7141d8c7c490c89e841e1db7e33256bd
BLAKE2b-256 93afd025491b1452257feaf3e45da753b7b1608361d7ba01b8d7604be959d0a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page