Skip to main content

df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.

Project description

Summary

df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.

PyPI package version Supported Python versions CI status of Linux/macOS/Windows Test coverage: coveralls CodeQL

Installation

pip install df-diskcache

Features

Supports the following methods:

  • get: Get a cache entry (pandas.DataFrame) for the key. Returns None if the key is not found.

  • set: Create a cache entry with an optional time-to-live (TTL) for the key-value pair.

  • update

  • touch: Update the last accessed time of a cache entry to extend the TTL.

  • delete

  • prune: Delete expired cache entries.

  • Dictionary-like operations:
    • __getitem__

    • __setitem__

    • __contains__

    • __delitem__

Usage

Basic Usage

import pandas as pd
from dfdiskcache import DataFrameDiskCache

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache.get(url)
if df is None:
    print("cache miss")
    df = pd.read_csv(url)
    cache.set(url, df)
else:
    print("cache hit")

print(df)

You can also use operations like a dictionary:

import pandas as pd
from dfdiskcache import DataFrameDiskCache

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache[url]
if df is None:
    print("cache miss")
    df = pd.read_csv(url)
    cache[url] = df
else:
    print("cache hit")

print(df)

Cache existence check

You can check if a cache entry exists by using the in operator.

import pandas as pd
from dfdiskcache import DataFrameDiskCache

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

if url in cache:
    print("cache exists")
    df = cache[url]
else:
    print("cache does not exist")

Set TTL for cache entries

You can set the default TTL for cache entries by setting the DataFrameDiskCache.DEFAULT_TTL or the ttl parameter of the set method.

import pandas as pd
from dfdiskcache import DataFrameDiskCache

DataFrameDiskCache.DEFAULT_TTL = 10  # you can override the default TTL (default: 3600 seconds)

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache.get(url)
if df is None:
    df = pd.read_csv(url)
    cache.set(url, df, ttl=60)  # you can set a TTL for the key-value pair

print(df)

Delete cache entries

You can delete a cache entry by using the del operator or the delete method.

import pandas as pd
from dfdiskcache import DataFrameDiskCache

cache = DataFrameDiskCache()

key = "example key"
if key not in cache:
    df = pd.DataFrame([["a", 1], ["b", 2]], columns=["col_a", "col_b"])
    cache.set(key, df)

# delete a cache entry by using del operator
del cache[url]

# delete a cache entry by using delete method
cache.delete(url)

Cache lifetime management

Expired cache entries are automatically deleted when you access a cache entry or invoke the prune method.

You can refresh the last accessed time of a cache entry by using the touch method.

Dependencies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

df_diskcache-0.1.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

df_diskcache-0.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file df_diskcache-0.1.0.tar.gz.

File metadata

  • Download URL: df_diskcache-0.1.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for df_diskcache-0.1.0.tar.gz
Algorithm Hash digest
SHA256 29f33692deddc2b8ef348004e7131955cda1f792ff4c94bc9b1315ddaad5ac58
MD5 1895748cdd832a7fd1a11742322b165b
BLAKE2b-256 53f436ef9f75c08c0b150f8741c6ac9aee486d32851c6e950c7fe079950a43b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for df_diskcache-0.1.0.tar.gz:

Publisher: publish.yml on thombashi/df-diskcache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file df_diskcache-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: df_diskcache-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for df_diskcache-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ee2fe8a3c1e5515871dc98095ffc0d93d7eba43b9d08941f3f0b56c04a2d19d
MD5 855de0a83ecd58f2e099821f35931d8e
BLAKE2b-256 9d8603b18b7a091d57b8acc2a089e8732d0ea3f903087f5ee21d7d37c51fbbd7

See more details on using hashes here.

Provenance

The following attestation bundles were made for df_diskcache-0.1.0-py3-none-any.whl:

Publisher: publish.yml on thombashi/df-diskcache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page