Keras(Tensorflow) implementations of Automatic Speech Recognition

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

DeepAsr

DeepAsr is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine.

DeepAsr will provide multiple Speech Recognition architectures, Currenly it provides Baidu's Deep Speech 2 using Keras (Tensorflow).

Using DeepAsr you can:

perform speech-to-text using pre-trained models
tune pre-trained models to your needs
create new models on your own

DeepAsr key features:

Multi GPU support: You can do much more like distribute the training using the Strategy, or experiment with mixed precision policy.
CuDNN support: Model using CuDNNLSTM implementation by NVIDIA Developers. CPU devices is also supported.
DataGenerator: The feature extraction (on CPU) can be parallel to model training (on GPU).

import numpy as np
import pandas as pd
import tensorflow as tf
import deepasr as asr

def get_config(features, multi_gpu):
    alphabet_en = asr.vocab.Alphabet(lang='en')
    if features == 'fbank':
        features_extractor = asr.features.FilterBanks(features_num=161,
                                                      winlen=0.02,
                                                      winstep=0.01,
                                                      winfunc=np.hanning)
    else:
        features_extractor = asr.features.Spectrogram(
            features_num=161,
            samplerate=16000,
            winlen=0.02,
            winstep=0.01,
            winfunc=np.hanning
        )
    model = asr.model.get_deepspeech2_v1(
        input_dim=161,
        output_dim=29,
        is_mixed_precision=True
        )
    optimizer = tf.keras.optimizers.Adam(
        lr=1e-4,
        beta_1=0.9,
        beta_2=0.999,
        epsilon=1e-8
        )
    decoder = asr.decoder.GreedyDecoder()

    pipeline = asr.pipeline.ctc_pipeline.CTCPipeline(
        alphabet=alphabet_en, features_extractor=features_extractor, model=model, optimizer=optimizer, decoder=decoder,
        sample_rate=16000, mono=True, multi_gpu=multi_gpu
    )
    return pipeline

def run(train_data, test_data, features='fbank', batch_size=32, epochs=10, multi_gpu=True):
    pipeline = get_config(features, multi_gpu)
    history = pipeline.fit_generator(train_data, batch_size=batch_size, epochs=epochs)
    pipeline.save('./checkpoints')
    print("Truth:", test_data['transcripts'].to_list()[0])
    print("Prediction", pipeline.predict(test_data['path'].to_list()[0]))
    return history

train = pd.read_csv('train_data.csv')
test = pd.read_csv('test_data.csv')
run(train, test, features='fbank', batch_size=32, epochs=100, multi_gpu=True)

Installation

You can use pip:

pip install deepasr

Getting started

The speech recognition is a tough task. You don't need to know all details to use one of the pretrained models. However it's worth to understand conceptional crucial components:

Input: WAVE files with mono 16-bit 16 kHz (up to 5 seconds)
FeaturesExtractor: Convert audio files using MFCC Features or Spectrogram
Model: CTC model defined in Keras (references: [1], [2])
Decoder: Greedy algorithm with the language model support decode a sequence of probabilities using Alphabet
DataGenerator: Stream data to the model via generator
Callbacks: Set of functions monitoring the training

Loaded pre-trained model has all components. The prediction can be invoked just by calling pipline.predict().

import pandas as pd
import deepasr as asr
pipeline = asr.pipeline.get_pipeline.load('./checkpoints')
test_data = pd.read_csv('test_data.csv')
print("Truth:", test_data['transcripts'].to_list()[0])
print("Prediction", pipeline.predict(test_data['path'].to_list()[0]))

References

The fundamental repositories:

Baidu - DeepSpeech2 - A PaddlePaddle implementation of DeepSpeech2 architecture for ASR
NVIDIA - Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
TensorFlow - The implementation of DeepSpeech2 model
Mozilla - DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
Espnet - End-to-End Speech Processing Toolkit
Automatic Speech Recognition - Distill the Automatic Speech Recognition research

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.2

Jan 13, 2022

0.1.1

Apr 30, 2020

0.1.0

Apr 30, 2020

0.0.9

Apr 22, 2020

0.0.8

Apr 22, 2020

0.0.7

Apr 20, 2020

0.0.6

Apr 20, 2020

0.0.5

Apr 20, 2020

0.0.4

Apr 20, 2020

0.0.3

Apr 20, 2020

This version

0.0.2

Apr 15, 2020

0.0.1

Apr 15, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepasr-0.0.2.tar.gz (33.6 kB view hashes)

Uploaded Apr 15, 2020 Source

Hashes for deepasr-0.0.2.tar.gz

Hashes for deepasr-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`cfa8e9afd5e558010ae72bfc4aee451bb1c51561ac9220555c5d00db3781235f`
MD5	`64eec764f0fed48bbb621ea18022fc1f`
BLAKE2b-256	`51a1f882759774034c33cad667271c967e07754d6396433ad4e8d6a0289ad080`