df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.
Project description
Summary
df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.
Installation
pip install df-diskcache
Features
Supports the following methods:
get: Get a cache entry (pandas.DataFrame) for the key. Returns None if the key is not found.
set: Create a cache entry with an optional time-to-live (TTL) for the key-value pair.
update
touch: Update the last accessed time of a cache entry to extend the TTL.
delete
prune: Delete expired cache entries.
- Dictionary-like operations:
__getitem__
__setitem__
__contains__
__delitem__
Usage
Basic Usage
import pandas as pd
from dfdiskcache import DataFrameDiskCache
cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"
df = cache.get(url)
if df is None:
print("cache miss")
df = pd.read_csv(url)
cache.set(url, df)
else:
print("cache hit")
print(df)
You can also use operations like a dictionary:
import pandas as pd
from dfdiskcache import DataFrameDiskCache
cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"
df = cache[url]
if df is None:
print("cache miss")
df = pd.read_csv(url)
cache[url] = df
else:
print("cache hit")
print(df)
Cache existence check
You can check if a cache entry exists by using the in operator.
import pandas as pd
from dfdiskcache import DataFrameDiskCache
cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"
if url in cache:
print("cache exists")
df = cache[url]
else:
print("cache does not exist")
Set TTL for cache entries
You can set the default TTL for cache entries by setting the DataFrameDiskCache.DEFAULT_TTL or the ttl parameter of the set method.
import pandas as pd
from dfdiskcache import DataFrameDiskCache
DataFrameDiskCache.DEFAULT_TTL = 10 # you can override the default TTL (default: 3600 seconds)
cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"
df = cache.get(url)
if df is None:
df = pd.read_csv(url)
cache.set(url, df, ttl=60) # you can set a TTL for the key-value pair
print(df)
Delete cache entries
You can delete a cache entry by using the del operator or the delete method.
import pandas as pd
from dfdiskcache import DataFrameDiskCache
cache = DataFrameDiskCache()
key = "example key"
if key not in cache:
df = pd.DataFrame([["a", 1], ["b", 2]], columns=["col_a", "col_b"])
cache.set(key, df)
# delete a cache entry by using del operator
del cache[url]
# delete a cache entry by using delete method
cache.delete(url)
Cache lifetime management
Expired cache entries are automatically deleted when you access a cache entry or invoke the prune method.
You can refresh the last accessed time of a cache entry by using the touch method.
Dependencies
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file df_diskcache-0.1.0.tar.gz.
File metadata
- Download URL: df_diskcache-0.1.0.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29f33692deddc2b8ef348004e7131955cda1f792ff4c94bc9b1315ddaad5ac58
|
|
| MD5 |
1895748cdd832a7fd1a11742322b165b
|
|
| BLAKE2b-256 |
53f436ef9f75c08c0b150f8741c6ac9aee486d32851c6e950c7fe079950a43b7
|
Provenance
The following attestation bundles were made for df_diskcache-0.1.0.tar.gz:
Publisher:
publish.yml on thombashi/df-diskcache
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
df_diskcache-0.1.0.tar.gz -
Subject digest:
29f33692deddc2b8ef348004e7131955cda1f792ff4c94bc9b1315ddaad5ac58 - Sigstore transparency entry: 669667207
- Sigstore integration time:
-
Permalink:
thombashi/df-diskcache@edd54152d7d3de4b40b1ea8455d34f9f02df6219 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/thombashi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@edd54152d7d3de4b40b1ea8455d34f9f02df6219 -
Trigger Event:
push
-
Statement type:
File details
Details for the file df_diskcache-0.1.0-py3-none-any.whl.
File metadata
- Download URL: df_diskcache-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ee2fe8a3c1e5515871dc98095ffc0d93d7eba43b9d08941f3f0b56c04a2d19d
|
|
| MD5 |
855de0a83ecd58f2e099821f35931d8e
|
|
| BLAKE2b-256 |
9d8603b18b7a091d57b8acc2a089e8732d0ea3f903087f5ee21d7d37c51fbbd7
|
Provenance
The following attestation bundles were made for df_diskcache-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on thombashi/df-diskcache
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
df_diskcache-0.1.0-py3-none-any.whl -
Subject digest:
7ee2fe8a3c1e5515871dc98095ffc0d93d7eba43b9d08941f3f0b56c04a2d19d - Sigstore transparency entry: 669667217
- Sigstore integration time:
-
Permalink:
thombashi/df-diskcache@edd54152d7d3de4b40b1ea8455d34f9f02df6219 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/thombashi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@edd54152d7d3de4b40b1ea8455d34f9f02df6219 -
Trigger Event:
push
-
Statement type: