Read, validate and print the contents of a publicly-accessible (open) S3 object, no AWS credentials required.
This project has been quarantined.
PyPI Admins need to review this project before it can be restored. While in quarantine, the project is not installable by clients, and cannot be being modified by its maintainers.
Read more in the project in quarantine help article.
Project description
dt-validator
Read, validate, and print the contents of a publicly-accessible (open) S3
object — no AWS credentials required. Uses boto3 with an unsigned (anonymous)
signature.
Features
- Anonymous reads of public S3 objects (
s3://, virtual-hosted, and path-style URLs). - A composable file-validation layer: extension, content-type, size bounds, non-empty, text-encoding, and checksum checks.
- Cheap pre-flight validation via a HEAD request before downloading the body.
- Typed exception hierarchy, opt-in logging,
py.typedfor type-checkers. - Library API and a CLI, both fully tested (pytest + moto).
Install
pip install -e . # runtime
pip install -e ".[dev]" # + pytest, moto, coverage
CLI
# Print an object
dt-validator s3://my-open-bucket/path/to/file.txt
# Metadata only (HEAD, no download)
dt-validator s3://my-open-bucket/file.txt --head
# Read only the first 1 KB
dt-validator s3://my-open-bucket/big.log --max-bytes 1024
# Raw bytes to stdout
dt-validator s3://my-open-bucket/logo.png --binary > logo.png
# With validation — fails (non-zero exit) if any constraint is violated
dt-validator s3://my-open-bucket/data.csv \
--ext csv --content-type text/csv \
--max-size 1048576 --non-empty \
--require-encoding utf-8 \
--checksum sha256:9f86d0818...
Exit codes: 0 ok · 1 S3/network error · 2 bad usage · 3 validation failed ·
4 not found · 5 access denied.
Library
from dt_validator import read_object, ValidationPolicy
# Simple read (str by default; encoding=None -> bytes)
text = read_object("s3://my-open-bucket/notes.txt")
# Read with a validation policy
policy = ValidationPolicy(
allowed_extensions=[".csv"],
allowed_content_types=["text/csv", "text/plain"],
max_bytes=5 * 1024 * 1024,
require_non_empty=True,
expected_encoding="utf-8",
checksum_algorithm="sha256",
expected_checksum="9f86d0818...",
)
data = read_object("s3://my-open-bucket/data.csv", policy=policy)
Reading a file whose URL comes from an API
The API returns a file URL, and the package then reads that file itself. The endpoint's response is the indirection — you configure the file location there instead of hard-coding it in your app.
Flow: call the API → extract the file URL from its response → read that file
(s3:// or http(s)://) → return its contents.
from dt_validator import read_file_from_api, read_url
# 1) call the API 2) read the file URL from its response 3) return that file's content
text = read_file_from_api() # endpoint defaults to https://file-read.free.beeceptor.com
# Custom endpoint / JSON field / validation policy
text = read_file_from_api(
"https://my-api.example.com/current-file",
url_field="url", # JSON field holding the file URL (default: "url")
method="GET", # or "POST"
policy=ValidationPolicy(max_bytes=1_000_000, expected_encoding="utf-8"),
)
# Or read a file URL you already have (s3:// or http(s)://)
data = read_url("s3://my-open-bucket/notes.txt")
CLI:
dt-validator --via-api
dt-validator --via-api \
--api-endpoint https://my-api.example.com/current-file \
--api-method GET --api-url-field url
Configure your endpoint to return the file URL, e.g. response body:
{"url": "s3://my-open-bucket/notes.txt"}
(a bare URL as plain text works too).
Note: the default endpoint is a Beeceptor mock — until you add a rule that returns a file URL, it replies with placeholder text and the package will report that no file URL was found.
Cheap metadata check without downloading:
from dt_validator import head_object
meta = head_object("s3://my-open-bucket/data.csv")
print(meta.size, meta.content_type, meta.etag)
Standalone validators (each raises a specific ValidationError subclass):
from dt_validator import (
validate_extension, validate_content_type, validate_size,
validate_not_empty, validate_encoding, validate_checksum,
)
validate_extension("data.csv", ["csv", ".tsv"])
validate_size(len(data), max_bytes=1_000_000)
validate_checksum(data, "sha256", expected_hex)
Package layout
src/dt_validator/
__init__.py public API
reader.py parsing, anonymous read, HEAD, error mapping
validation.py validators + ValidationPolicy
exceptions.py typed exception hierarchy
_logging.py NullHandler + opt-in configure_logging()
cli.py argparse CLI
tests/
test_reader.py URI parsing
test_validation.py validators + policy
test_reader_integration.py reader against moto S3
test_cli.py CLI against moto S3
Exceptions
All derive from FileValidatorError:
InvalidUriError(also aValueError)ObjectNotFoundError,AccessDeniedError,RemoteReadErrorValidationError→FileSizeError,ExtensionError,ContentTypeError,EncodingError,ChecksumError
Note on "open" access
The object (or bucket) must allow anonymous s3:GetObject. This tool intentionally
sends unsigned requests, so private objects return 403 AccessDenied.
Tests
pytest # 46 tests
pytest --cov # with coverage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dt_validator-0.3.0.tar.gz.
File metadata
- Download URL: dt_validator-0.3.0.tar.gz
- Upload date:
- Size: 25.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6aba0ac08b81f56de7a7d80e7ace561c8f897fc573a61308d70491e627d65fbc
|
|
| MD5 |
331149c7af77583799066657bb93d74c
|
|
| BLAKE2b-256 |
f543d16dacb2a6c758310456c3a9c61e886a48384842895e392bec2f00084c32
|
File details
Details for the file dt_validator-0.3.0-py3-none-any.whl.
File metadata
- Download URL: dt_validator-0.3.0-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6b9fa23d300bd6133dd4c7ecfa05893ed68dc108cdd6726c0c256441eba5f06
|
|
| MD5 |
7141d8c7c490c89e841e1db7e33256bd
|
|
| BLAKE2b-256 |
93afd025491b1452257feaf3e45da753b7b1608361d7ba01b8d7604be959d0a4
|