Skip to main content

Mishkal: Arabic text diacritization library for Python

Project description

Mishkal Arabic text vocalization software مشكال لتشكيل النصوص العربية

Developpers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com

Feature s

value

Authors

Authors.md

Release

1.10 Bouira

License

GPL

Tracker

linuxscout/mishkal/Issues

Mailing list

` <http://groups.google.com/group/mishkal/>`__

Website

tahadz.com/mishkal

Source

Github

Downloa d

sourceforge

Feedbac ks

Comments

Account s

[@Facebook](https://www.facebook.com/mishkalarabic) [@Twitter](https://twitter.com/linuxscout) [@Sourceforge](http://sourceforge.net/projectsmishkal/)

Citation

@thesis{zerrouki2020adawat,
author = {Taha Zerrouki},
title = {Towards An Open Platform For Arabic Language Processing},
type = {PhD thesis},
institution = {Ecole Nationale Supérieure d'informatique, Alger, Algérie},
date = {2020},
}

Install

You can Install Mishkal as library or Software ### Python lib

pip install mishkal

Install from github

  1. Clone mishkal project from GitHub:

git clone https://github.com/linuxscout/mishkal.git
  1. Install necessary packages:

pip install -r miskal/requirements.txt

requirments

- pyarabic  : basic arabic library
- sylajone  : aranasyn syntaxical analyzer
- arramooz  : arabic morphological dictionary
- asmai     : semantic analyzer
- CodernityDB :  pure python, fast, NoSQL database, used as cache system to minimize load of morphological analyzer
- collocations : collocation library ( deprecated)
- libqutrub : verb conjugation library used by morphological analyzer
- maskouk   : collocation library
- naftawayh : word tag library
- qalsadi   ; morphological analyzer
- tashaphyne : light stemmer used by morphological analyzer

Usage

Mishkal provides:

  • Console command line

  • python library

  • GUi interface

  • Web interface

  • API interface ### GUI:

  • Windows: MishkalGui.exe

  • GUI: Linux python interfaces/gui/mishkal-gui.py ### Web server (linux, windows) python3 interfaces/web/mishkal-webserver

  • serving on 0.0.0.0:8080 view at http://127.0.0.1:8080

  • open in your browser the URL: http://127.0.0.1:8080

Console (linux/windows)

$ python3 bin/mishkal-console.py -f filename

Usage: bin/mishal-console.py  -f filename [OPTIONS]
           bin/mishal-console.py  'السلام عليكم' [OPTIONS]

        [-f | --file = filename]       input file
        [-o | --outfile = filename]    output file to write vocalized text to, '$FILENAME (Tashkeel).txt' by default

        [-h | --help]             outputs this usage message
        [-v | --version]        program version
        [-p | --progress]      display progress status
        [-a | --verbose]       enable verbosity

        * Tashkeel Actions
        -------------------
        [-r | --reduced]        Reduced Tashkeel.
        [-s | --strip]             Strip tashkeel (remove harakat).
        [-c | --compare]      compare the vocalized text with the program output

        * Tashkeel Options
        ------------------
        [-l | --limit]             vocalize only a limited number of line
        [-x | --syntax]         disable syntaxic analysis
        [-m | --semantic]    disable semantic analysis
        [-g | --train]             enable training option
        [-i | --ignore]           ignore the last Mark on output words.
        [-t | --stat]               disable statistic tashkeel

This program is licensed under the GPL License

Library

pip install mishkal

example:

>>> import mishkal.tashkeel
>>> vocalizer = mishkal.tashkeel.TashkeelClass()
>>> text = u"تطلع الشمس صباحا"
>>> vocalizer.tashkeel(text)
' تَطْلُعُ الشَّمْسُ صَبَاحًا'
>>>

JSON connection API التشكيل عن بعد

يمكن استدعاء خدمة الموقع عبر مكتبة جيسون json و ajax من أي موقع، ويمكنك استعمالها في موقعك. * طريقة الاستدعاء 1- باستعمال تقنية json مع مكتبة Jquery

<!DOCTYPE html   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <script src="http://code.jquery.com/jquery-latest.js"></script>
</head>
<body>
  <div id="result">

</div>
<script>
$().ready(function() {
$.getJSON("http://tahadz.com/mishkal/ajaxGet", {text:"السلام عليكم\nاهلا بكم\nكيف حالكم", action:"TashkeelText"},
  function(data) {
      $("#result").text(data.result);
  });

 });
</script>

الاستدعاء يكون كما يأتي

$.getJSON("http://tahadz.com/mishkal/ajax...", {text:"السلام عليكم\nاهلا بكم\nكيف حالكم", action:"TashkeelText"},

حيث

  • text: النص المطلوب تشكيله.

  • action: العملية المطلوبة وهنا هي TashkeelText.

النتيجة تكون من الشكل

{"result": " السّلامُ عَلَيكُمْ اهلا بِكُمْ كَيْفَ حالُكُمْ", "order": "0"}

حيث

  • result: النص الناتج المشكول.

  • order: رقم السطر في النص الأصلي، فإذا كان النص الأصلي كبيرا يقسمه المشكال لعدد من الاسطر، وقد لا يرجعون في نفس الترتيب، لذا حددنا رقم الترتيب.

How does Mishkal work:

Mishkal use a rule based method to detect relations and diacritics, First, it analyzes all morphological cases, it generates all possible diacritized word forms, by detecting all affixes and check it in a dictionary. second, It add word frequency to each word.

The two previous steps are made by support/Qalsadi ( arabic morphological analyzer), the used dictionary is a separated project named ‘Arramooz: arabic dictionnary for morphology”.

Third, we use a syntax analyzer to detect all possible relations between words. The syntax library is named support/ArAnaSyn. This analyzer is basic for the moment, it use only linear relations between adjacent words.

Forth, all data generated and relations will be analyzed semantically, to detect semantic relation in order to reduce ambiguity. The use libary is support/asmai ( Arabic semantic analysis). The semantic relations extraction is based on corpus. The used corpus is named “Tashkeela: arabic vocalized texts corpus”.

In the final stage, The module mishkal/tashkeel tries to select the suitable word in the context, it tries to get evidents cases, or more related cases, else, it tries to select more probable case, using some rules like select a stop word by default, or select Mansoub case by default.

The rest of program provides functions to handles interfaces and API with web/desktop or command line

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mishkal-0.2.tar.gz (766.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mishkal-0.2-py3-none-any.whl (782.0 kB view details)

Uploaded Python 3

mishkal-0.2-py2-none-any.whl (777.1 kB view details)

Uploaded Python 2

File details

Details for the file mishkal-0.2.tar.gz.

File metadata

  • Download URL: mishkal-0.2.tar.gz
  • Upload date:
  • Size: 766.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.12

File hashes

Hashes for mishkal-0.2.tar.gz
Algorithm Hash digest
SHA256 469b90924660bdeb57c72f08ddc6b51f6b3051f7c68e81f42ddbbd9951014f58
MD5 b711193e10a1d3adb344d24d0579a98e
BLAKE2b-256 55ebb510b1c418d318232a83b7a66bce3ad2158f28f12aa1787019f962c5516f

See more details on using hashes here.

File details

Details for the file mishkal-0.2-py3-none-any.whl.

File metadata

  • Download URL: mishkal-0.2-py3-none-any.whl
  • Upload date:
  • Size: 782.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.12

File hashes

Hashes for mishkal-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1286369d13ca1dc76c877e9e66296d473138af90bdc39386db0c056ff6b5a390
MD5 a86f7fd8243429620808b3e80c186b19
BLAKE2b-256 31647612ca075272a215f25d6256593997f7e5865c737e15c9b4dc4253b0d6c7

See more details on using hashes here.

File details

Details for the file mishkal-0.2-py2-none-any.whl.

File metadata

  • Download URL: mishkal-0.2-py2-none-any.whl
  • Upload date:
  • Size: 777.1 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.19.9 CPython/2.7.12

File hashes

Hashes for mishkal-0.2-py2-none-any.whl
Algorithm Hash digest
SHA256 1ea7e1dda3a317c81be2528b0f212962d0ca4910886b89220c9321fb154a130d
MD5 97677835a85cfabf4bb1d063587cc487
BLAKE2b-256 feaa2f3215dd097bc293df19a0f6f852fd0748c7aff74f5699da923fa3e10b7e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page