Some tools for working with data
Project description
datools
Introduction
datools is a collection of Python-based tools for working with data in relational databases. While it contains several utilities for smoothing the rough edges of SQL, its most baked component is datools.diff, an algorithm that's best explained in a blog post and Jupyter Notebook.
To learn more, read the docs or reach out.
Database support
While datools generates SQL for its operations, different databases
have their nuances. datools may run on your database today, but in
an attempt to give you some certainty as to databases we know it has
successfully run on, we run all tests in the test suite against the
following databases:
| Database | Evaluated by test suite |
|---|---|
| SQLite | Since v0.1.2 |
| DuckDB | Since v0.1.4 |
| PostgreSQL | Since v0.1.5 |
| Redshift, Snowflake | You provide an instance, I'll make the tests pass |
History
0.1.5 (2022-04-13)
- Support for PostgreSQL! The test suite now runs against PostgreSQL, and
datools.explanations.diffnow allows you to ask "why" about data stored in Postgres. Get excited! datools.sqlalchemy_utils.grouping_sets_querywill now generate a GROUPING SETs query for databases that support grouping sets (e.g., Postgres, DuckDB) or the equivalent UNION ALL version for databases without grouping sets support (e.g., SQLite). For more, check out the example in the docs.
0.1.4 (2022-02-27)
- Python 3.10 support.
- Updated test suite to run tests against multiple databases, in particular expanding from SQLite only to DuckDB and SQLite.
- As a result of the last bullet, ensured code runs against DuckDB in addition to SQLite.
- First stab at documentation (https://datools.readthedocs.io/en/latest/).
0.1.3 (2021-12-31)
- Introduced mypy to linting and CI to ensure code that makes it to
mainhas proper types. - Created first working example of DIFF working on a real-world dataset as a Jupyter notebook. This example partially replicates the Scorpion paper when only moteid/sensorids are considered.
- Separated the
on_columnsargument ofdiffintoon_column_values(columns for which you want to generate equality predicates as explanations) and andon_column_ranges(columns for which you want to generate range predicates as explanations after bucketing the ranges into 15 equi-sized buckets).
0.1.2 (2021-11-07)
- First release of DIFF algorithm implementation.
0.1.0 (2021-05-09)
- First release on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datools-0.1.5.tar.gz.
File metadata
- Download URL: datools-0.1.5.tar.gz
- Upload date:
- Size: 23.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.8.2 requests/2.27.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4659cb258cb59443b0ac123120c5e9a7fcc271010ad4f3cce066e464ac2b93bd
|
|
| MD5 |
2e5663fd3c5d107e9510603e129a3404
|
|
| BLAKE2b-256 |
190d047532faa41899b02b622d64139412760d935f74ed5dc8081cd02d2cf2ea
|
File details
Details for the file datools-0.1.5-py2.py3-none-any.whl.
File metadata
- Download URL: datools-0.1.5-py2.py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.8.2 requests/2.27.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c915ddd216225b2b0b5d1c5fcbc70ea02c84b25cfa7b82bb822b8125e5bd68e
|
|
| MD5 |
1ee38bfc17b629228c535bb0d46855c9
|
|
| BLAKE2b-256 |
e1789c0010d2202905536c7f38c1b515e473c34c2998e1d2fb66291e1f2f7fd5
|