Skip to main content

df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.

Project description

Summary

df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.

PyPI package version Supported Python versions CI status of Linux/macOS/Windows Test coverage: coveralls CodeQL

Installation

pip install df-diskcache

Features

Supports the following methods:

  • get: Get a cache entry (pandas.DataFrame) for the key. Returns None if the key is not found.

  • set: Create a cache entry with an optional time-to-live (TTL) for the key-value pair.

  • update

  • touch: Update the last accessed time of a cache entry to extend the TTL.

  • delete

  • prune: Delete expired cache entries.

  • Dictionary-like operations:
    • __getitem__

    • __setitem__

    • __contains__

    • __delitem__

Usage

Sample Code:
import pandas as pd
from dfdiskcache import DataFrameDiskCache

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache.get(url)
if df is None:
    print("cache miss")
    df = pd.read_csv(url)
    cache.set(url, df)
else:
    print("cache hit")

print(df)

You can also use operations like a dictionary:

Sample Code:
import pandas as pd
from dfdiskcache import DataFrameDiskCache

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache[url]
if df is None:
    print("cache miss")
    df = pd.read_csv(url)
    cache[url] = df
else:
    print("cache hit")

print(df)

Set TTL for cache entries

Sample Code:
import pandas as pd
from dfdiskcache import DataFrameDiskCache

DataFrameDiskCache.DEFAULT_TTL = 10  # you can override the default TTL (default: 3600 seconds)

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache.get(url)
if df is None:
    df = pd.read_csv(url)
    cache.set(url, df, ttl=60)  # you can set a TTL for the key-value pair

print(df)

Dependencies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

df-diskcache-0.0.2.tar.gz (8.3 kB view hashes)

Uploaded Source

Built Distribution

df_diskcache-0.0.2-py3-none-any.whl (6.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page