Skip to main content

A Cython MeCab wrapper for fast, pythonic Japanese tokenization.

Project description

Open in Streamlit Current PyPI packages Test Status PyPI - Downloads Supported Platforms

fugashi

fugashi by Irasutoya

fugashi is a Cython wrapper for MeCab, a Japanese tokenizer and morphological analysis tool. Wheels are provided for Linux, OSX, and Win64, and UniDic is easy to install.

issueを英語で書く必要はありません。

Check out the interactive demo, see the blog post for background on why fugashi exists and some of the design decisions, or see this guide for a basic introduction to Japanese tokenization.

If you are on an unsupported platform (like PowerPC), you'll need to install MeCab first. It's recommended you install from source. If you need to build from source on Windows, @chezou's fork is recommended; see issue #44 for an explanation of the problems with the official repo.

Usage

from fugashi import Tagger

tagger = Tagger('-Owakati')
text = "麩菓子は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger(text):
    print(word, word.feature.lemma, word.pos, sep='\t')
    # "feature" is the Unidic feature data as a named tuple

Installing a Dictionary

fugashi requires a dictionary. UniDic is recommended, and two easy-to-install versions are provided.

  • unidic-lite, a slightly modified version 2.1.2 of Unidic (from 2013) that's relatively small
  • unidic, the latest UniDic 3.1.0, which is 770MB on disk and requires a separate download step

If you just want to make sure things work you can start with unidic-lite, but for more serious processing unidic is recommended. For production use you'll generally want to generate your own dictionary too; for details see the MeCab documentation.

To get either of these dictionaries, you can install them directly using pip or do the below:

pip install fugashi[unidic-lite]

# The full version of UniDic requires a separate download step
pip install fugashi[unidic]
python -m unidic download

For more information on the different MeCab dictionaries available, see this article.

Dictionary Use

fugashi is written with the assumption you'll use Unidic to process Japanese, but it supports arbitrary dictionaries.

If you're using a dictionary besides Unidic you can use the GenericTagger like this:

from fugashi import GenericTagger
tagger = GenericTagger()

# parse can be used as normal
tagger.parse('something')
# features from the dictionary can be accessed by field numbers
for word in tagger(text):
    print(word.surface, word.feature[0])

You can also create a dictionary wrapper to get feature information as a named tuple.

from fugashi import GenericTagger, create_feature_wrapper
CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
tagger = GenericTagger(wrapper=CustomFeatures)
for word in tagger.parseToNodeList(text):
    print(word.surface, word.feature.alpha)

Citation

If you use fugashi in research, it would be appreciated if you cite this paper. You can read it at the ACL Anthology or on Arxiv.

@inproceedings{mccann-2020-fugashi,
    title = "fugashi, a Tool for Tokenizing {J}apanese in Python",
    author = "McCann, Paul",
    booktitle = "Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.nlposs-1.7",
    pages = "44--51",
    abstract = "Recent years have seen an increase in the number of large-scale multilingual NLP projects. However, even in such projects, languages with special processing requirements are often excluded. One such language is Japanese. Japanese is written without spaces, tokenization is non-trivial, and while high quality open source tokenizers exist they can be hard to use and lack English documentation. This paper introduces fugashi, a MeCab wrapper for Python, and gives an introduction to tokenizing Japanese.",
}

Alternatives

If you have a problem with fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.

  • If you don't want to deal with installing MeCab at all, try SudachiPy.
  • If you need to work with Korean, try pymecab-ko or KoNLPy.

License and Copyright Notice

fugashi is released under the terms of the MIT license. Please copy it far and wide.

fugashi is a wrapper for MeCab, and fugashi wheels include MeCab binaries. MeCab is copyrighted free software by Taku Kudo <taku@chasen.org> and Nippon Telegraph and Telephone Corporation, and is redistributed under the BSD License.

Project details


Release history Release notifications | RSS feed

This version

1.2.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fugashi-1.2.1.tar.gz (338.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fugashi-1.2.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (321.5 kB view details)

Uploaded PyPymacOS 10.9+ x86-64

fugashi-1.2.1-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (273.5 kB view details)

Uploaded PyPymacOS 10.9+ x86-64

fugashi-1.2.1-cp311-cp311-win_amd64.whl (497.8 kB view details)

Uploaded CPython 3.11Windows x86-64

fugashi-1.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (605.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

fugashi-1.2.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (593.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

fugashi-1.2.1-cp310-cp310-win_amd64.whl (498.6 kB view details)

Uploaded CPython 3.10Windows x86-64

fugashi-1.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (599.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

fugashi-1.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (585.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

fugashi-1.2.1-cp310-cp310-macosx_10_9_x86_64.whl (287.3 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

fugashi-1.2.1-cp39-cp39-win_amd64.whl (499.7 kB view details)

Uploaded CPython 3.9Windows x86-64

fugashi-1.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (613.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

fugashi-1.2.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (601.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

fugashi-1.2.1-cp39-cp39-macosx_10_9_x86_64.whl (287.3 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

fugashi-1.2.1-cp38-cp38-win_amd64.whl (499.7 kB view details)

Uploaded CPython 3.8Windows x86-64

fugashi-1.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (615.9 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

fugashi-1.2.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (603.2 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ ARM64

fugashi-1.2.1-cp38-cp38-macosx_10_9_x86_64.whl (286.0 kB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

fugashi-1.2.1-cp37-cp37m-win_amd64.whl (499.1 kB view details)

Uploaded CPython 3.7mWindows x86-64

fugashi-1.2.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (584.0 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

fugashi-1.2.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (570.5 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ ARM64

fugashi-1.2.1-cp37-cp37m-macosx_10_9_x86_64.whl (284.8 kB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

File details

Details for the file fugashi-1.2.1.tar.gz.

File metadata

  • Download URL: fugashi-1.2.1.tar.gz
  • Upload date:
  • Size: 338.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for fugashi-1.2.1.tar.gz
Algorithm Hash digest
SHA256 e7c7aea6684006af223b5145b96c071c2e10353afb9df6712df4458003cf62d3
MD5 9edab5c67c3258c8a5724fafae6bebc4
BLAKE2b-256 4daa008562fae5099633dfe87b68627f2a532b4f92f5348f75edaeec25c990f4

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 fc0a81e10197ac055bb20489e55d3669c082df5b76123aeb4dcc692031578434
MD5 b4ccbb107f439235ebdaa5373737040f
BLAKE2b-256 2d677e27ba0082c130c1a83ebeddd8765b25398db51ce514d184ef844c956f36

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-pp37-pypy37_pp73-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 badc89f57997fbb7efba62c73b608f097bcf2414b01a9344f97951534af46aa5
MD5 4e10974de23e3dbd93a20ece1e77b27d
BLAKE2b-256 49979a0bc3ec73544f4e7557cc0ab029d7c5e25fee436c64b10f7d52b5a051e3

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.2.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 497.8 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for fugashi-1.2.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 1391cadd916237b7b8b498380cc421340b32c6c4265f0ef4d14b19be3e7ce9ed
MD5 2bbf04001084a03bfdff821c36939444
BLAKE2b-256 13f9d353bc33bd498fe5849db0c3ee6f581c8e87f95fd7361d07a864f63ecc84

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b670f95d8210975a0343883e5a37a514388b0c879d219ef37746e169d03c3b84
MD5 694c909441cc920c8fb9b9beca7e00fd
BLAKE2b-256 98f4ef47ec3c6c92be98a56f82468ece79ec2eac369a6464cb9c4270fc782c96

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6d67b8e7007ae9883a7888139c177f054da64d003d15dc22943b9a983d876404
MD5 f27c6d8120f928d0884e54e411893ab3
BLAKE2b-256 1d4648eed72c302820e0c6313b166b68e3818b16593a6c9f4f07a012816e3baa

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.2.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 498.6 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for fugashi-1.2.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 2d0e08b130e2e3fd354d9b513623bfe8e17ca569e08b85f2c93cb3b096d70c23
MD5 4a8ec0bb7000aafe3f8c09285f1c1002
BLAKE2b-256 5b371ef00146a4e0ee2884d50f976332ab1b36ef943da7a7c4d128a47ebf3e9d

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8282a1669f976e466ca8577d578356109e87d8efabc4f7413eb34fb89d5c7c5a
MD5 8d13c3bd5132173a8a20feb92332430a
BLAKE2b-256 407691d72d7d3aa10e46dda6a8487a618e8cb2c23ac62b536b0638f6c39715b7

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f62a59aac8a39f0d3fd74e9fd8f030491566ebe805f18015d3382cc48619f6e5
MD5 63eb19b1a5b7b9fdfc6a31e8b06f7370
BLAKE2b-256 8749557a8e5852b5fe8fa1d66579b17702c8ead52991cd4a77a4788bcfe2299e

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 98779cb9d7c28b5b65a8f7ace77c657dcc81d5f7affb8babe409cc6b5fef9a60
MD5 518ed5c1cf2053964eae48526a492c33
BLAKE2b-256 02f129cc70062ca2854562d5119b6d1e44e6813e57a63bf333efb6de4a51b727

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.2.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 499.7 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for fugashi-1.2.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 a30caae5dce0c02e2a5020e6b15e55861782a51088669de69814632f33b8e2e7
MD5 1bf73b62b1a0752f20191bb982a3213d
BLAKE2b-256 69fb87b1c5a05ea6f053c222ce65bb6e5219f1c8d0c653bffd573f376f040303

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a905395c5ff32a505947a4a549ab77ed6987a558023abe8ecb4b8a867bd8c9c2
MD5 b6af7b5c16b4e5e122c9ebd310dd45c5
BLAKE2b-256 f8e13b160bf749fdbae1cad22326751a6ea65ef9bb5cce6cc99fa1d85c310857

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 09d0746b21cf5825b572d03e681fef7e60f85f8203498a6d84a67d8bc7b69d69
MD5 491a0b411d683473dcc136d9c7bae374
BLAKE2b-256 334b58082c05440ea1b2e57a040a24123eb5766412d266982acb82b0fb42f372

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a0fca2040219ccd4e30e853828e9296a05efed796274b526bac8f3f87f02e9ae
MD5 be0dce70a7f71ec6150874245ff6da20
BLAKE2b-256 a0416a0d3314081b5d2e1b1a47e68d58f0f62fee4346760ebec5c8091ac0dc44

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.2.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 499.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for fugashi-1.2.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 82f53673e3b811dad8b6607f15997014af18179124dfbd7f49df4a1d66c3c7a5
MD5 abdb89f746b847a9582fe25021b9bf33
BLAKE2b-256 8bbc834e485ec65f322ee7a1311920fd11a1b83b8837563962a8f175bc56ac3d

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ed08bf98620f25204f40fe759340cecfa25e2f8e66b6c91ff2f2500dbf5be817
MD5 fce4873866b616d4df62d57a73f0498e
BLAKE2b-256 2d420d7af1417fd1102d08e1853afbaeed8165e72ccecf4bbaacdf67a961afa9

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 07d50bb39b1e47f974335cc999107bad97ee1028699d86d6f12bd11308b91930
MD5 2897e96197378643e7f74d4c18e2409b
BLAKE2b-256 fc9969f923f8993316d642167114297672d7897dc6e41ec30a981fe6bf40ce96

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 68e0f9491487a16a173a6c0d6d5a2369ff9e276f1d8cd7f476515f0f1391ad5b
MD5 e1554e56fb4ade3da19bc64ef9d70ea4
BLAKE2b-256 da67ee457ff70d6caaad820b623daa368f90d2960b87b09555c249c9fcff92a8

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: fugashi-1.2.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 499.1 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for fugashi-1.2.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 b4ce85c3fc48ce90ebbc7974b470dbb3c1f53d89cf1ee4305e84612d7beee1e9
MD5 eac231040cd0359a759e59c5d6f3289c
BLAKE2b-256 ee7e62330bd0f474c4ce6ee7a2b0afdc7fbf27c9cb0084e276a996bc6bd630b6

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c689951c7946eeac290b3f184916e40c7ac2653f6090d7544117c9057d1ac890
MD5 a5d0c4a2bb3ad125bfaf1ec9aea1574f
BLAKE2b-256 60e57f355429e0281e7eee0b3e9f90bfb965b30f2f51d6222b61ba1559c5374d

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ad9a9a9c780d0771195c23f343dff9c3e626d1fb6b0b87c3bb9ebc39ad654130
MD5 2356ca27ce8d6079616f698585ed7288
BLAKE2b-256 94867be16715738ee7b36e21347cfabec0ed3edb63fe61b79ccf9de315923d03

See more details on using hashes here.

File details

Details for the file fugashi-1.2.1-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for fugashi-1.2.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 9d5391aa6ee4860fdf765b206e0104d0de5aff20cc2149ecd00e0acc7860ee9a
MD5 5bd9f0d27b2182e10fcfe120ae78300b
BLAKE2b-256 8c4798832575653bcc651b6ed5da90c4875580214a21862f45b556a26b661045

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page