Skip to main content

Detect languages via a fasttext model

Project description

fastlid

[tests]pythonCode style: blackLicense: MITPyPI version

Language identification based on fasttext (lid.176.ftz https://fasttext.cc/docs/en/language-identification.html).

The lid.176.ftz file is licensed under Creative Commons Attribution-Share-Alike License 3.0 and is not part of this module. It is automatically downloaded from its external origin on the first run of this module.

This module attempts to immitate the follow two features of langid

  • langid.classify: fastlid
  • langid.set_languages(langs=[...]): fastlid.set_languages = [...]
    • import fastlid
    • fastlid.set_languages = ['nl','fr'])
  • TODO: Commandline interface

Preinstall fasttext for Windows without C compiler

pip install fasttext

For Windows without a C/C++ compiler:

pip install fasttext*.whl

or (for python 3.8)

pip install https://github.com/ffreemt/ezbee/raw/main/data/artifects/fasttext-0.9.2-cp38-cp38-win_amd64.whl

Install it

pip install fastlid

or install from git

pip install git+https://github.com/ffreemt/fast-langid.git

# also works pip install git+https://github.com/ffreemt/fast-langid

or clone the git repo and install from source.

Use it

from fastlid import fastlid, supported_langs

# support 176 languages
print(supported_langs, len(supported_langs))
# ['af', 'als', 'am', 'an', 'ar', 'arz', 'as', 'ast', 'av', 'az'] 176

fastlid("test this")
# ('en', 0.765)

fastlid("test this 测试一下", k=2)
# (['zh', 'en'], [0.663, 0.124])

fastlid.set_languages = ['fr', 'zh']
fastlid("test this 测试吧")
# ('zh', 0.01)

fastlid.set_languages = None
fastlid("test this 测试吧")
('en', 0.686)

fastlid.set_languages = ['fr', 'zh', 'en']
fastlid("test this 测试吧", k=3)
(['en', 'zh', 'fr'], [0.686, 0.01, 0.006])

N.B. hanzidentifier can be used to identify simplified Chinese or/and traditional Chinese should you need to do so.

For Developers

Install poetry and yarn the way you like it.

poetry install  # install python packages
yarn install --dev  # install necesary node packages

# ...code...
yarn test
yarn final

# ...optionally submit pr...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastlid-0.1.9.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastlid-0.1.9-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file fastlid-0.1.9.tar.gz.

File metadata

  • Download URL: fastlid-0.1.9.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.8.10 Windows/10

File hashes

Hashes for fastlid-0.1.9.tar.gz
Algorithm Hash digest
SHA256 7db4ee1c2d07604fe5218f7b0050b8f42086440be94244616956bbf9e5bb57e1
MD5 490d74576451bffa2f420ca3cdbbb9f8
BLAKE2b-256 cca903c8ae327a0f9bed9a1a7c19345b791b0afe4dfdf5b480eca4b7e3f5edaa

See more details on using hashes here.

File details

Details for the file fastlid-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: fastlid-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.8.10 Windows/10

File hashes

Hashes for fastlid-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 a1fc0b11dc79904f09f818e549436132df5a6d02d4c22224b1e1357d624f52e2
MD5 774bd3e0b255670f9494a96df7a44759
BLAKE2b-256 a33718c8da7582147d7d92557861f408a02d59cd2d4753df691241a64c8de561

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page