Skip to main content

Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

Project description

Kaldi Active Grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

PyPI - Version PyPI - Python Version PyPI - Wheel PyPI - Downloads GitHub - Downloads

Batteries-Included Gitter

Donate Donate Donate Donate [GitHub is matching (only) my GitHub Sponsors donations.]

Python package developed to enable context-based command & control of computer applications, as in the Dragonfly speech recognition framework, using the Kaldi automatic speech recognition engine.

Normally, Kaldi decoding graphs are monolithic, require expensive up-front off-line compilation, and are static during decoding. Kaldi's new grammar framework allows multiple independent grammars with nonterminals, to be compiled separately and stitched together dynamically at decode-time, but all the grammars are always active and capable of being recognized.

This project extends that to allow each grammar/rule to be independently marked as active/inactive dynamically on a per-utterance basis (set at the beginning of each utterance). Dragonfly is then capable of activating only the appropriate grammars for the current environment, resulting in increased accuracy due to fewer possible recognitions. Furthermore, the dictation grammar can be shared between all the command grammars, which can be compiled quickly without needing to include large-vocabulary dictation directly.

Features

  • Binaries: The Python package includes all necessary binaries for decoding on Windows/Linux/MacOS. Available on PyPI.
    • Binaries are generated from my fork of Kaldi, which is only intended to be used by kaldi-active-grammar directly, and not as a stand-alone library.
  • Pre-trained model: A compatible general English Kaldi nnet3 chain model is trained on ~3000 hours of open audio. Available under project releases.
  • Plain dictation: Do you just want to recognize plain dictation? Seems kind of boring, but okay! There is an interface for plain dictation (see below), using either your specified HCLG.fst file, or KaldiAG's included pre-trained dictation model.
  • Dragonfly/Caster: A compatible backend for Dragonfly is under development in the kaldi branch of my fork, and has been merged as of Dragonfly v0.15.0.
    • See its documentation, try out a demo, or use the loader to run all normal dragonfly scripts.
    • You can try it out easily on Windows using a simple no-install package: see Getting Started below.
    • Caster is supported as of KaldiAG v0.6.0 and Dragonfly v0.16.1.
    • Support for KaldiAG v1.2.0 has been merged as of Dragonfly v0.20.0! Improvements include Improved Recognition, Weights on Any Elements, Pluggable Alternative Dictation, Stand-alone Plain Dictation Interface, and various bug fixes & optimizations. For details and previous versions' improvements, see project releases.

Demo Video

Demo Video

Donations are appreciated to encourage development.

Donate Donate Donate Donate [GitHub is currently matching all my donations $-for-$.]

Related Repositories

Getting Started

Want to get started quickly & easily on Windows? Available under project releases:

  • kaldi-dragonfly-winpython: A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!
  • kaldi-dragonfly-winpython-dev: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!
  • kaldi-caster-winpython-dev: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2 + caster. Just unzip and run!

Otherwise...

Setup

Requirements:

  • Python 2.7 or 3.6+; 64-bit required!
  • OS: Windows/Linux/MacOS
  • Only supports Kaldi left-biphone models, specifically nnet3 chain models, with specific modifications
  • ~1GB+ disk space for model plus temporary storage and cache, depending on your grammar complexity
  • ~1GB+ RAM for model and grammars, depending on your model and grammar complexity

Install Python package, which includes necessary Kaldi binaries:

pip install kaldi-active-grammar

Download compatible generic English Kaldi nnet3 chain model from project releases. Unzip the model and pass the directory path to kaldi-active-grammar constructor.

Or use your own model. Standard Kaldi models must be converted to be usable. Conversion can be performed automatically, but this hasn't been fully implemented yet.

Troubleshooting

  • Errors installing
    • Make sure you're using a 64-bit Python.
    • You must install via pip install kaldi-active-grammar (directly or indirectly), not python setup.py install, in order to get the required binaries.
    • Update your pip (to at least 19.0+) by executing python -m pip install --upgrade pip, to support the required python binary wheel package.
  • Try deleting the Kaldi model *.tmp directory and rerunning.
  • For reporting issues, try running with import logging; logging.basicConfig(level=1) at the top of your main file to enable full debugging logging.

Documentation

Documentation is sorely lacking currently. To see example usage, examine the backend for Dragonfly.

The KaldiAG API is pretty low level, but basically: you define a set of rules, and send in audio data, along with a bit mask of which rules are active at the beginning of each utterance, and receive back the recognized rule and text. The easy way is to go through Dragonfly, which makes it easy to define the rules, contexts, and actions.

Plain dictation interface

import sys, wave
from kaldi_active_grammar import PlainDictationRecognizer
recognizer = PlainDictationRecognizer()  # Or supply non-default model_dir, tmp_dir, or fst_file
filename = sys.argv[1] if len(sys.argv) > 1 else 'test.wav'
wave_file = wave.open(filename, 'rb')
data = wave_file.readframes(wave_file.getnframes())
output_str, likelihood = recognizer.decode_utterance(data)
print(repr(output_str), likelihood)  # -> 'it depends on the context' 2.1386399269104004

Building

  • Linux/MacOS: python3 setup.py bdist_wheel (see CMakeLists.txt for details)
  • Windows: currently complicated (see my fork of Kaldi, then similar to Linux/MacOS)

Contributing

Issues, suggestions, and feature requests are welcome & encouraged. Pull requests are considered, but project structure is in flux.

Donations are appreciated to encourage development.

Donate Donate Donate Donate [GitHub is currently matching all my donations $-for-$.]

Author

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE.txt file for details. If this license is problematic for you, please contact me.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kaldi_active_grammar-1.5.0-py2.py3-none-win_amd64.whl (23.8 MB view details)

Uploaded Python 2Python 3Windows x86-64

kaldi_active_grammar-1.5.0-py2.py3-none-manylinux2010_x86_64.whl (22.0 MB view details)

Uploaded Python 2Python 3manylinux: glibc 2.12+ x86-64

kaldi_active_grammar-1.5.0-py2.py3-none-macosx_10_14_x86_64.whl (10.8 MB view details)

Uploaded Python 2Python 3macOS 10.14+ x86-64

File details

Details for the file kaldi_active_grammar-1.5.0-py2.py3-none-win_amd64.whl.

File metadata

  • Download URL: kaldi_active_grammar-1.5.0-py2.py3-none-win_amd64.whl
  • Upload date:
  • Size: 23.8 MB
  • Tags: Python 2, Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.3

File hashes

Hashes for kaldi_active_grammar-1.5.0-py2.py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 4545b5dd0b92a957fff0c0b094212ad38f48fb639fb1c63a32d73689d5a912aa
MD5 af4d68708fd7b0e257c28268bfd831ad
BLAKE2b-256 9ff1221c622392f01963fe791011082b07b6b5f25dcf5e69808ffd5a38ce2fdf

See more details on using hashes here.

File details

Details for the file kaldi_active_grammar-1.5.0-py2.py3-none-manylinux2010_x86_64.whl.

File metadata

  • Download URL: kaldi_active_grammar-1.5.0-py2.py3-none-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 22.0 MB
  • Tags: Python 2, Python 3, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.5.3

File hashes

Hashes for kaldi_active_grammar-1.5.0-py2.py3-none-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 767279f4c9de6b0de58d95ccad57eb0abf9f330d2d6db1e4856eeb0079eed5f1
MD5 3a085158e6b5bce052e177ef0b532268
BLAKE2b-256 88de5533f5bf1431658b793303eb44e845e3ba7e7a8127e0bc6f034d0dbc20d6

See more details on using hashes here.

File details

Details for the file kaldi_active_grammar-1.5.0-py2.py3-none-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: kaldi_active_grammar-1.5.0-py2.py3-none-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 10.8 MB
  • Tags: Python 2, Python 3, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for kaldi_active_grammar-1.5.0-py2.py3-none-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 914f42993a6465b2d3d5d975a7fe63df3652034b1a98b3042cac92f2ff0b30db
MD5 d3f115610f67456eabe94459e9262cb8
BLAKE2b-256 86812ee7681458a7ed3be757c90d400dc690879cf490bd564a3ebbac8bac3b65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page