An utility library for processing Vietnamese texts
Project description
chiecthuyenngoaixa
chiecthuyenngoaixa is a Python library which provides functions and classes for various tasks in processing Vietnamese texts, such as removing diacritics, converting numbers to words, sorting strings, validations and more.
This library is written on pure Python with no dependencies. Python 3.8 and above is supported.
Installation
Chiecthuyenngoaixa is available on PyPI. Open a terminal or Command Prompt (on Windows) and run the following command:
pip install chiecthuyenngoaixa
If you are using Poetry, use this instead:
poetry add chiecthuyenngoaixa
Basic usage
The library will now be available as ctnx
module (abbreviation of
chiecthuyenngoaixa).
Some commonly used functions and classes can be imported directly. For example:
- To convert Vietnamese text to ASCII-only text:
>>> from ctnx import remove_diacritics
>>> remove_diacritics("Đàn ong thấy cái lon thì bu vào.")
'Dan ong thay cai lon thi bu vao.'
- To convert a number to Vietnamese text:
>>> from ctnx import num_to_words
>>> num_to_words(123456789021003.45)
'một trăm hai mươi ba nghìn bốn trăm năm mươi sáu tỉ bảy trăm tám mươi chín triệu không trăm hai mươi mốt nghìn không trăm linh ba phẩy bốn mươi lăm'
- To sort Vietnamese texts:
>>> from ctnx import ViSortKey
>>> lines = ['Hà Nam', 'Hải Dương', 'Hà Nội', 'Hà Tĩnh', 'Hải Phòng', 'Hậu Giang', 'Hoà Bình', 'Hưng Yên', 'Hạ Long', 'Hà Giang', 'Điện Biên'\]
>>> sorted(lines, key=ViSortKey)
['Điện Biên', 'Hà Giang', 'Hà Nam', 'Hà Nội', 'Hà Tĩnh', 'Hải Dương', 'Hải Phòng', 'Hạ Long', 'Hậu Giang', 'Hoà Bình', 'Hưng Yên']
For further usages, see the documentation, which is hosted on chiecthuyenngoaixa.readthedocs.io.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chiecthuyenngoaixa-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3502e3562377b7735dddbe929ab035263b29b57203674e1d0f4f1c5d12dd510e |
|
MD5 | 76a06a06a582134333211fe1efdef7e5 |
|
BLAKE2b-256 | 09eac1a652c935c99b2cd6ff55805f99ba06c7f17969f8843c62202bbd0152fc |