Skip to main content

Python library for digesting Persian text.

Project description

Python library for digesting Persian text.

  • Text cleaning

  • Sentence and word tokenizer

  • Word lemmatizer

  • POS tagger

  • Shallow parser

  • Dependency parser

  • Interfaces for Persian corpora

  • NLTK compatible

  • Python 2.7, 3.2, 3.3 and 3.4 support

  • Build Status

Usage

>>> from __future__ import unicode_literals
>>> from hazm import *

>>> normalizer = Normalizer()
>>> normalizer.normalize('اصلاح نويسه ها و استفاده از نیم‌فاصله پردازش را آسان مي كند')
'اصلاح نویسه‌ها و استفاده از نیم‌فاصله پردازش را آسان می‌کند'

>>> sent_tokenize('ما هم برای وصل کردن آمدیم! ولی برای پردازش، جدا بهتر نیست؟')
['ما هم برای وصل کردن آمدیم!', 'ولی برای پردازش، جدا بهتر نیست؟']
>>> word_tokenize('ولی برای پردازش، جدا بهتر نیست؟')
['ولی', 'برای', 'پردازش', '،', 'جدا', 'بهتر', 'نیست', '؟']

>>> stemmer = Stemmer()
>>> stemmer.stem('کتاب‌ها')
'کتاب'
>>> lemmatizer = Lemmatizer()
>>> lemmatizer.lemmatize('می‌روم')
'رفت#رو'

>>> tagger = POSTagger(model='resources/postagger.model')
>>> tagger.tag(word_tokenize('ما بسیار کتاب می‌خوانیم'))
[('ما', 'PRO'), ('بسیار', 'ADV'), ('کتاب', 'N'), ('می‌خوانیم', 'V')]

>>> chunker = Chunker(model='resources/chunker.model')
>>> tagged = tagger.tag(word_tokenize('کتاب خواندن را دوست داریم'))
>>> tree2brackets(chunker.parse(tagged))
'[کتاب خواندن NP] [را POSTP] [دوست داریم VP]'

>>> parser = DependencyParser(tagger=tagger, lemmatizer=lemmatizer)
>>> parser.parse(word_tokenize('زنگ‌ها برای که به صدا درمی‌آید؟'))
<DependencyGraph with 8 nodes>

Installation

pip install hazm

We have also trained tagger and parser models. You may put these models in the resources folder of your project.

Extensions

Thanks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hazm-0.5.1.tar.gz (150.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hazm-0.5.1-py2.py3-none-any.whl (158.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file hazm-0.5.1.tar.gz.

File metadata

  • Download URL: hazm-0.5.1.tar.gz
  • Upload date:
  • Size: 150.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hazm-0.5.1.tar.gz
Algorithm Hash digest
SHA256 0b992b165e850e6787f81f09a5cbb95d2fd89dfad42f092326b09d3c28732770
MD5 91aba1923e07a2f38eca35749e25a5fa
BLAKE2b-256 8da8d6d211e029bc1d1c588bcfeffda27635d865140410cb0010fb117c6e4619

See more details on using hashes here.

File details

Details for the file hazm-0.5.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for hazm-0.5.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 74d21d0515da5d0cfe6f2b14fa20f3d4f479e44a02e19e48d6e2e7b758221d80
MD5 5c953629414c55a98a98a480897cd1d9
BLAKE2b-256 bd85976e3d9198b9e3635fb799f31700d0650b7ebc4cecd863b2d81f02dd7f28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page