Skip to main content

Xina processing library

Project description

Xina AI

Xina Processor is an open source library for cleaning Arabic text. It includes various cleaning functions as well as modules for streaming file and folder cleaning.

Installation

PIP

If you use pip, you can install xinaprocessor with:

pip install xinaprocessor

From source

You can directly clone this repo and install the library. First clone the repo with:

git clone https://github.com/xina-ai/xinaprocessor.git

Then cd to the directory and install the library with:

pip install -e .

Documentation

Documentation is still in process here: https://xina-ai.github.io/xinaprocessor/

Getting Started

from xinaprocessor import cleaners

To clean text

Text = "نص عربي!"
Cleaner = cleaners.TextCleaner(text=Text)
Cleaner.keep_arabic_only()

To clean text File

# Creating File MyData.txt
FilePath = "MyData.txt"
with open(FilePath, "w") as f:
   f.write("Aالسطر الأول\nالسطر الثاني!")
# Creating FileCleaner object
Cleaner = cleaners.FileCleaner(filepath=FilePath)
Cleaner.remove_english_text().remove_arabic_numbers().remove_punctuations()
# To access the resulted data
CleanedData = Cleaner.lines # the result will look like ['السطر الأول', 'السطر الثاني']
CleanedText = Cleaner.text # the result will look like 'السطر الأول\nالسطر الثاني'
# To save the proccessed/cleaned text to a file
Cleaner.save2file('CleanedData.txt', encoding='utf-8')

To clean large text File

# This Cleaner is used for large text files, the cleaned texts will be saved to CleanedFile.txt file
FilePath = "MyData.txt"
CleanedPath = "CleanedFile.txt"
Cleaner = cleaners.FileStreamCleaner(filepath=FilePath, savepath=CleanedPath)
Cleaner.remove_hashtags().remove_honorific_signs().drop_empty_lines().clean()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

xinaprocessor-0.41-py3-none-any.whl (17.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page