Skip to main content

Python library for digesting Persian text.

Project description

Hazm

Python library for digesting Persian text.

  • Text cleaning
  • Sentence and word tokenizer
  • Word lemmatizer
  • POS tagger
  • Shallow parser
  • Dependency parser
  • Interfaces for Persian corpora
  • NLTK compatible
  • Python 2.7, 3.4, 3.5 and 3.6 support
  • Build Status

Usage

>>> from __future__ import unicode_literals
>>> from hazm import *

>>> normalizer = Normalizer()
>>> normalizer.normalize('اصلاح نويسه ها و استفاده از نیم‌فاصله پردازش را آسان مي كند')
'اصلاح نویسه‌ها و استفاده از نیم‌فاصله پردازش را آسان می‌کند'

>>> sent_tokenize('ما هم برای وصل کردن آمدیم! ولی برای پردازش، جدا بهتر نیست؟')
['ما هم برای وصل کردن آمدیم!', 'ولی برای پردازش، جدا بهتر نیست؟']
>>> word_tokenize('ولی برای پردازش، جدا بهتر نیست؟')
['ولی', 'برای', 'پردازش', '،', 'جدا', 'بهتر', 'نیست', '؟']

>>> stemmer = Stemmer()
>>> stemmer.stem('کتاب‌ها')
'کتاب'
>>> lemmatizer = Lemmatizer()
>>> lemmatizer.lemmatize('می‌روم')
'رفت#رو'

>>> tagger = POSTagger(model='resources/postagger.model')
>>> tagger.tag(word_tokenize('ما بسیار کتاب می‌خوانیم'))
[('ما', 'PRO'), ('بسیار', 'ADV'), ('کتاب', 'N'), ('می‌خوانیم', 'V')]

>>> chunker = Chunker(model='resources/chunker.model')
>>> tagged = tagger.tag(word_tokenize('کتاب خواندن را دوست داریم'))
>>> tree2brackets(chunker.parse(tagged))
'[کتاب خواندن NP] [را POSTP] [دوست داریم VP]'

>>> parser = DependencyParser(tagger=tagger, lemmatizer=lemmatizer)
>>> parser.parse(word_tokenize('زنگ‌ها برای که به صدا درمی‌آید؟'))
<DependencyGraph with 8 nodes>

Installation

The latest stabe verson of Hazm can be installed through pip:

pip install hazm

But for testing or using Hazm with the latest updates you may use:

pip install https://github.com/sobhe/hazm/archive/master.zip --upgrade

We have also trained tagger and parser models. You may put these models in the resources folder of your project.

Extensions

Note: These are not official versions of hazm, not uptodate on functionality and are not supported by Sobhe.

  • JHazm: A Java port of Hazm
  • NHazm: A C# port of Hazm

Thanks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hazm-0.6.0.1.tar.gz (305.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hazm-0.6.0.1-py3-none-any.whl (316.7 kB view details)

Uploaded Python 3

hazm-0.6.0.1-py2-none-any.whl (316.7 kB view details)

Uploaded Python 2

File details

Details for the file hazm-0.6.0.1.tar.gz.

File metadata

  • Download URL: hazm-0.6.0.1.tar.gz
  • Upload date:
  • Size: 305.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.15rc1

File hashes

Hashes for hazm-0.6.0.1.tar.gz
Algorithm Hash digest
SHA256 fb0e12e4e5ae3adab83f6ffcee1f32af7679bff6cbfe626223e454ed8bc10b0d
MD5 43e5e932c1732b0ddd7029b2b3f33ce5
BLAKE2b-256 af6a845e0c4951eeca8b3fce5fbe1f0d6500faf1c3bf09c5ca601c54e794f611

See more details on using hashes here.

File details

Details for the file hazm-0.6.0.1-py3-none-any.whl.

File metadata

  • Download URL: hazm-0.6.0.1-py3-none-any.whl
  • Upload date:
  • Size: 316.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.15rc1

File hashes

Hashes for hazm-0.6.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5568dfcb92c389701c5f0117aad01eed667ea24810f6994b84a6dc94157a47fb
MD5 4d2d668e51af639dbe12c2eeb6e945cc
BLAKE2b-256 53724eb2844b82374a03a15d2ff52668483818f3e2cf4878b71a242f8f9d247c

See more details on using hashes here.

File details

Details for the file hazm-0.6.0.1-py2-none-any.whl.

File metadata

  • Download URL: hazm-0.6.0.1-py2-none-any.whl
  • Upload date:
  • Size: 316.7 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.15rc1

File hashes

Hashes for hazm-0.6.0.1-py2-none-any.whl
Algorithm Hash digest
SHA256 3d8af428616cce78326beefa5527493e4511bbb099a3b62d67185a99b3f6287a
MD5 266dd764b6b9e461ef02a84a958802a0
BLAKE2b-256 0cd6c649177ce9b3fb031d6743a0adc0b14240db0b9a2e6b63c14b7ce49b81c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page