Skip to main content

A Cython wrapper for MeCab

Project description

Current PyPI packages

fugashi

Fugashi by Irasutoya

Fugashi is a Cython wrapper for MeCab, a Japanese tokenizer and morphological analysis tool. Wheels are provided for Linux, OSX, and Win64, and UniDic is easy to install (see docs below).

See the blog post for background on why Fugashi exists and some of the design decisions.

If you are on an unsupported platform (like PowerPC), you'll need to install MeCab first. It's recommended you install from source.

Usage

from fugashi import Tagger

tagger = Tagger('-Owakati')
text = "麩菓子(ふがし)は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 ( ふ が し ) は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger(text):
    print(word, word.feature.lemma, word.pos, sep='\t')
    # "feature" is the Unidic feature data as a named tuple

Installing a Dictionary

Fugashi requires a dictionary. UniDic is recommended, and two easy-to-install versions are provided.

  • unidic-lite, a 2013 version of Unidic that's relatively small
  • unidic, the latest UniDic 2.3.0, which is 1GB on disk and requires a separate download step

If you just want to make sure things work you can start with unidic-lite, but for more serious processing unidic is recommended. For production use you'll generally want to generate your own dictionary too; for details see the MeCab documentation.

To get either of these dictionaries, you can install them directly using pip or do the below:

pip install fugashi[unidic-lite]

# The full version of UniDic requires a separate download step
pip install fugashi[unidic]
python -m unidic download

Dictionary Use

Fugashi is written with the assumption you'll use Unidic to process Japanese, but it supports arbitrary dictionaries.

If you're using a dictionary besides Unidic you can use the GenericTagger like this:

from fugashi import GenericTagger
tagger = GenericTagger()

# parse can be used as normal
tagger.parse('something')
# features from the dictionary can be accessed by field numbers
for word in tagger(text):
    print(word.surface, word.feature[0])

You can also create a dictionary wrapper to get feature information as a named tuple.

from fugashi import GenericTagger, create_feature_wrapper
CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
tagger = GenericTagger(wrapper=CustomFeatures)
for word in tagger.parseToNodeList(text):
    print(word.surface, word.feature.alpha)

Alternatives

If you have a problem with Fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.

  • If you want to use MeCab on a platform we don't have wheels for, but don't have a C compiler, use natto-py.
  • If you don't want to deal with installing MeCab at all, try SudachiPy.
  • If you need to work with Korean, try KoNLPy.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fugashi-0.2.0.tar.gz (333.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fugashi-0.2.0-cp38-cp38-win_amd64.whl (500.1 kB view details)

Uploaded CPython 3.8Windows x86-64

fugashi-0.2.0-cp38-cp38-manylinux1_x86_64.whl (474.3 kB view details)

Uploaded CPython 3.8

fugashi-0.2.0-cp37-cp37m-win_amd64.whl (498.9 kB view details)

Uploaded CPython 3.7mWindows x86-64

fugashi-0.2.0-cp37-cp37m-manylinux1_x86_64.whl (466.4 kB view details)

Uploaded CPython 3.7m

fugashi-0.2.0-cp36-cp36m-win_amd64.whl (498.9 kB view details)

Uploaded CPython 3.6mWindows x86-64

fugashi-0.2.0-cp36-cp36m-manylinux1_x86_64.whl (467.2 kB view details)

Uploaded CPython 3.6m

fugashi-0.2.0-cp35-cp35m-win_amd64.whl (497.9 kB view details)

Uploaded CPython 3.5mWindows x86-64

fugashi-0.2.0-cp35-cp35m-manylinux1_x86_64.whl (462.4 kB view details)

Uploaded CPython 3.5m

File details

Details for the file fugashi-0.2.0.tar.gz.

File metadata

  • Download URL: fugashi-0.2.0.tar.gz
  • Upload date:
  • Size: 333.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.0.tar.gz
Algorithm Hash digest
SHA256 51a89d1c2ed8aba07263543a4746405bc572226f9ecc0a34e7d5cdc35bca6d06
MD5 ad97399d644c7b1cb6ba5d746f901876
BLAKE2b-256 650a86d936f6c8da49c4eb85522d368dc09bda14ada801b07b8535a0e0ee1460

See more details on using hashes here.

File details

Details for the file fugashi-0.2.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: fugashi-0.2.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 500.1 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 8a2e4e11c65962728339f3efb701c965681453f33558940dd8f8a1712885fd86
MD5 8026264a049037fe920dd0b570edeb67
BLAKE2b-256 c02e3eee04ef768b5da87e399bf2a0658d6389663f7df16150008937a725f1f4

See more details on using hashes here.

File details

Details for the file fugashi-0.2.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-0.2.0-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 474.3 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3a4b1026a6404bcdd52d628dcab1259304a875f8e672e6bb8d27e069f6df90bc
MD5 fff496a14637ccdfa4d2e7cb07d97195
BLAKE2b-256 e9f4f617e36b587ca862f9ac7055c5c26b2dec3106f018b8e1ca6e05a2334daf

See more details on using hashes here.

File details

Details for the file fugashi-0.2.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: fugashi-0.2.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 498.9 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 8bc94a1f419bc5a6d9b438b7fab66f8befc8cd923c261efe7f9d3f1e75b90aa2
MD5 72491530ea24e91b3d46e2c7212296b2
BLAKE2b-256 ea0b026c9ab7c14e91ae8d577a041ab6796a7d98894146663ed6239929a420c3

See more details on using hashes here.

File details

Details for the file fugashi-0.2.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-0.2.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 466.4 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f55c405dd8e6f58c584957cce6d9c8ce6638b3adbeadd6b356ed6e6eabdc55de
MD5 edee29be8c0247248c05d5111a3cfec8
BLAKE2b-256 e2bc3dda6c1f67f769d61b5700bc818403235563f74b12c8e9ae90207efc8837

See more details on using hashes here.

File details

Details for the file fugashi-0.2.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: fugashi-0.2.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 498.9 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 fc02c3f9ebf4f65db980a8270de6089815f00c071f364f1d326148e59c928299
MD5 98a11726f04d50e6728524ae631b5482
BLAKE2b-256 f1232bd73638bb29098486b1e9a245d5118fd6b9623d749704c7107db3ec96d0

See more details on using hashes here.

File details

Details for the file fugashi-0.2.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-0.2.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 467.2 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 48584a03a587cad49d6b3b534ffe92f52d00221cf3c3cd55df55bd37ed28caa5
MD5 621992a69a2d14329cf5b2bd92801fce
BLAKE2b-256 4edaeaaa87d51bb786cfdec4ac24aca2c2b7acb27658b419f3fcaad3bf1f0254

See more details on using hashes here.

File details

Details for the file fugashi-0.2.0-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: fugashi-0.2.0-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 497.9 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 8b03e35f33d9ee7172eb029dce5c7ef12bd958c1d177f123662930eb0d0b3ddc
MD5 272c52c4867ac2cd517af16a0dc11fce
BLAKE2b-256 eb3290369bfef599ac882960b60841f6e950debc41ca402d8a4270267c25bccc

See more details on using hashes here.

File details

Details for the file fugashi-0.2.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: fugashi-0.2.0-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 462.4 kB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.8.2

File hashes

Hashes for fugashi-0.2.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2debd8c5021f4561769ee9aa29212ce3f1e68da2f20ea80beeabb82ae6bb5b40
MD5 145cdd10688bb4b5d1b2bb44ba167df8
BLAKE2b-256 cb837fe54113c2e41e4945dc6f102b2599b32c7da5c8e6db41e7a62b7bf6fe08

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page