!Alpha Version! - This repository contains code to make datasets stored on the corpora network drive of the chair compatible with the [tensorflow dataset api](https://www.tensorflow.org/api_docs/python/tf/data/Dataset)
Project description
Description
This repository contains code to make datasets stored on th corpora network drive of the chair compatible with the tensorflow dataset api .
Currently available Datasets
Dataset | Status | Url |
---|---|---|
audioset | ❌ | https://research.google.com/audioset/ |
ckplus | ✅ | http://www.iainm.com/publications/Lucey2010-The-Extended/paper.pdf |
faces | ✅ | https://faces.mpdl.mpg.de/imeji/ |
is2021_ess | ❌ | - |
librispeech | ❌ | https://www.openslr.org/12 |
nova_dynamic | ✅ | https://github.com/hcmlab/nova |
Example Usage
import os
import tensorflow as tf
import tensorflow_datasets as tfds
import hcai_datasets
from matplotlib import pyplot as plt
# Preprocessing function
def preprocess(x, y):
img = x.numpy()
return img, y
# Creating a dataset
ds, ds_info = tfds.load(
'hcai_example_dataset',
split='train',
with_info=True,
as_supervised=True,
builder_kwargs={'dataset_dir': os.path.join('path', 'to', 'directory')}
)
# Input output mapping
ds = ds.map(lambda x, y: (tf.py_function(func=preprocess, inp=[x, y], Tout=[tf.float32, tf.int64])))
# Manually iterate over dataset
img, label = next(ds.as_numpy_iterator())
# Visualize
plt.imshow(img / 255.)
plt.show()
Example Usage Nova Dynamic Data
import os
import hcai_datasets
import tensorflow_datasets as tfds
## Load Data
ds, ds_info = tfds.load(
'hcai_nova_dynamic',
split='dynamic_split',
with_info=True,
as_supervised=True,
builder_kwargs={
# Database Config
'db_config_path': 'db.cfg',
'db_config_dict': None,
# Dataset Config
'dataset': '<dataset_name>',
'nova_data_dir': os.path.join('C:', 'Nova', 'Data'),
'sessions': ['<session_name>'],
'roles': ['<role_one>', '<role_two>'],
'schemes': ['<label_scheme_one'],
'annotator': '<annotator_id>',
'data_streams': ['<stream_name>'],
# Sample Config
'frame_step': 1,
'left_context': 0,
'right_context': 0,
'start': None,
'end': None,
#'flatten_samples': False,
'supervised_keys': ['<role_one>.<stream_name>', '<scheme_two>'],
# Additional Config
'clear_cache' : True
}
)
data_it = ds.as_numpy_iterator()
ex_data = next(data_it)
print(ex_data)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hcai-datasets-0.0.6.tar.gz
(18.9 kB
view hashes)
Built Distribution
Close
Hashes for hcai_datasets-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa77df02a9344facbd5fcbd70713772145ed3da71eaeee2d076156c0445502ad |
|
MD5 | 044116dd5a26beea6fad14c0fb3fab2b |
|
BLAKE2b-256 | 2b410504f8d988493d80349b2dcadf5972ad3c78225de119c28e686583308d6e |