Memory frugal torch dataset from a csv collection
Project description
csvsdataset
csvsdataset
is a Python library designed to simplify the process of working with multiple CSV files as a single dataset. The primary functionality is provided by the CsvsDataset
class in the csvsdataset.py
module.
Installation
To install the csvsdataset
library, simply run:
pip install csvsdataset
Usage
from csvsdataset.csvsdataset import CsvsDataset
# Initialize the CsvsDataset instance
dataset = CsvsDataset(folder_path="path/to/your/csv/folder",
file_pattern="*.csv",
x_columns=["column1", "column2"],
y_column="target_column")
# Iterate over the dataset
for x_data, y_data in dataset:
# Your processing code here
pass
# Access a specific item in the dataset
x_data, y_data = dataset[42]
Memory frugality
Only data from a small number of csv files is maintained in memory. The rest is discarded on a LRU basis. This class is intended for use when a very large number of data files exist which cannot be loaded into memory conveniently.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
csvsdataset-0.0.5.tar.gz
(35.0 MB
view hashes)
Built Distribution
Close
Hashes for csvsdataset-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 768661fdabb5d553097f02fa44e2cea0692973047c9ae8068ba7bb5975300379 |
|
MD5 | 0e56bcf06af0a5314cebfdf0b50f707d |
|
BLAKE2b-256 | a3f5a524f3f69503c74518ba3703eae1d76d46e1be8e6a3e5143ebd27d0ceda5 |