Skip to main content

Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

Project description

Kaldi Active Grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

PyPI - Status PyPI - Version PyPI - Python Version PyPI - Wheel PyPI - Downloads Batteries-Included

Python package developed to enable context-based command & control of computer applications, as in the Dragonfly speech recognition framework, using the Kaldi automatic speech recognition engine.

UNDER ACTIVE DEVELOPMENT

Normally, Kaldi decoding graphs are monolithic, require expensive up-front off-line compilation, and are static during decoding. Kaldi's new grammar framework allows multiple independent grammars with nonterminals, to be compiled separately and stitched together dynamically at decode-time, but all the grammars are always active and capable of being recognized.

This project extends that to allow each grammar/rule to be independently marked as active/inactive dynamically on a per-utterance basis (set at the beginning of each utterance). Dragonfly is then capable of activating only the appropriate grammars for the current environment, resulting in increased accuracy due to fewer possible recognitions. Furthermore, the dictation grammar can be shared between all the command grammars, which can be compiled quickly without needing to include large-vocabulary dictation directly.

  • The Python package includes all necessary binaries for decoding on Linux or Windows. Available on PyPI.
  • A compatible general English Kaldi nnet3 chain model is trained on ~1200 hours of open audio. Available under project releases.
  • A compatible backend for Dragonfly is under development in the kaldi branch of my fork.
    • You can try it out easily on Windows using a simple no-install package: see Getting Started below.
    • A beta version has been merged as of Dragonfly v0.15.0!
    • Support for KaldiAG v0.5.0 has been merged as of Dragonfly v0.16.0!
      • User Lexicon: you can add new words/pronunciations to the model's lexicon to be recognized & used in grammars, and the pronunciations can be either specified explicitly or inferred automatically.
      • Compilation Optimizations: compilation while loading grammars uses the disk much less, and far fewer passes are made over the graphs, as separate modules have been customized & combined.
      • Better Model: 50% more training data.
    • Support for KaldiAG v0.6.0 has been merged as of Dragonfly v0.16.1!
      • Caster: many big fixes and optimizations to get Caster running.

Donations are appreciated to encourage development.

Donate Donate

Getting Started

Want to get started quickly & easily on Windows? Available under project releases:

  • kaldi-dragonfly-winpython: A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!
  • kaldi-dragonfly-winpython-dev: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!
  • kaldi-caster-winpython-dev: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2 + caster. Just unzip and run!

Otherwise...

Setup

Requirements:

  • Python 2.7 (3.x support planned); 64-bit required!
    • Microphone support provided by pyaudio package
  • OS: Linux or Windows; macOS planned if there is interest
  • Only supports Kaldi left-biphone models, specifically nnet3 chain models, with specific modifications
  • ~1GB+ disk space for model plus temporary storage and cache, depending on your grammar complexity
  • ~500MB+ RAM for model and grammars, depending on your model and grammar complexity

Install Python package, which includes necessary Kaldi binaries:

pip install kaldi-active-grammar

Download compatible generic English Kaldi nnet3 chain model from project releases. Unzip the model and pass the directory path to kaldi-active-grammar constructor.

Or use your own model. Standard Kaldi models must be converted to be usable. Conversion can be performed automatically, but this hasn't been fully implemented yet.

Troubleshooting

  • Errors installing
    • Make sure you're using a 64-bit Python.
    • Update your pip by executing pip install --upgrade pip.

Documentation

Documentation is sorely lacking currently. To see example usage, examine the backend for Dragonfly.

Contributing

Issues, suggestions, and feature requests are welcome & encouraged. Pull requests are considered, but project structure is in flux.

Donations are appreciated to encourage development.

Donate Donate

Author

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0), with the exception of the associated binaries, whose source is currently unreleased and which are only to be used by this project. See the LICENSE.txt file for details.

If this license is problematic for you, please contact me.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kaldi_active_grammar-0.7.2-py2.py3-none-win_amd64.whl (31.9 MB view details)

Uploaded Python 2Python 3Windows x86-64

kaldi_active_grammar-0.7.2-py2.py3-none-manylinux2010_x86_64.whl (27.2 MB view details)

Uploaded Python 2Python 3manylinux: glibc 2.12+ x86-64

File details

Details for the file kaldi_active_grammar-0.7.2-py2.py3-none-win_amd64.whl.

File metadata

  • Download URL: kaldi_active_grammar-0.7.2-py2.py3-none-win_amd64.whl
  • Upload date:
  • Size: 31.9 MB
  • Tags: Python 2, Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for kaldi_active_grammar-0.7.2-py2.py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 67a6710a1a19faa07b629706eab13b9e76df88fa9b24c104edd6704b10923238
MD5 3aae1979f766ce00e34e249189d2c1d6
BLAKE2b-256 d0742b5cdd6d2476f321097e9c6bbb5e7585b4f7d5e8ed7012f0cbac447179fe

See more details on using hashes here.

File details

Details for the file kaldi_active_grammar-0.7.2-py2.py3-none-manylinux2010_x86_64.whl.

File metadata

  • Download URL: kaldi_active_grammar-0.7.2-py2.py3-none-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 27.2 MB
  • Tags: Python 2, Python 3, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.29.1 CPython/2.7.13

File hashes

Hashes for kaldi_active_grammar-0.7.2-py2.py3-none-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 9b7838ca0de8077e93852bbebf52224814eacd4c60b5db1cd24c2521d9ced0ab
MD5 753e26cd5b9aef59bf17a66d55b433b4
BLAKE2b-256 05679f20599d92c7ba62c84784c803570c3fb11e13b268a5a1e128c9ce54f342

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page