Skip to main content

Xina processing library

Project description

Xina AI

Xina Processor is an open source library for cleaning Arabic text. It includes various cleaning functions as well as modules for streaming file and folder cleaning.

Installation

PIP

If you use pip, you can install xinaprocessor with:

pip install xinaprocessor

From source

You can directly clone this repo and install the library. First clone the repo with:

git clone https://github.com/xina-ai/xinaprocessor.git

Then cd to the directory and install the library with:

pip install -e .

Documentation

Documentation is still in process here: https://xina-ai.github.io/xinaprocessor/

Getting Started

from xinaprocessor import cleaners

To clean text

Text = "نص عربي!"
Cleaner = cleaners.TextCleaner(text=Text)
Cleaner.keep_arabic_only()

To clean text File

# Creating File MyData.txt
FilePath = "MyData.txt"
with open(FilePath, "w") as f:
   f.write("Aالسطر الأول\nالسطر الثاني!")
# Creating FileCleaner object
Cleaner = cleaners.FileCleaner(filepath=FilePath)
Cleaner.remove_english_text().remove_arabic_numbers().remove_punctuations()
# To access the resulted data
CleanedData = Cleaner.lines # the result will look like ['السطر الأول', 'السطر الثاني']
CleanedText = Cleaner.text # the result will look like 'السطر الأول\nالسطر الثاني'
# To save the proccessed/cleaned text to a file
Cleaner.save2file('CleanedData.txt', encoding='utf-8')

To clean large text File

# This Cleaner is used for large text files, the cleaned texts will be saved to CleanedFile.txt file
FilePath = "MyData.txt"
CleanedPath = "CleanedFile.txt"
Cleaner = cleaners.FileStreamCleaner(filepath=FilePath, savepath=CleanedPath)
Cleaner.remove_hashtags().remove_honorific_signs().drop_empty_lines().clean()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xinaprocessor-0.41-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file xinaprocessor-0.41-py3-none-any.whl.

File metadata

  • Download URL: xinaprocessor-0.41-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.2.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.5

File hashes

Hashes for xinaprocessor-0.41-py3-none-any.whl
Algorithm Hash digest
SHA256 83d6fdaed4647aed209c5b66c8db5bc62343ab4808ce41cf24f60fd9d5dac952
MD5 c23606860229ebaf25954d9b1639d76b
BLAKE2b-256 4f46e6debf98e32364404860950bc3007f7d99bfab041e48e5b08495f200557e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page