underthesea

Vietnamese NLP Toolkit

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Natural Language
- English
Programming Language
- Python :: 3.6

Project description

Underthesea - Vietnamese NLP Toolkit

https://img.shields.io/pypi/v/underthesea.svg

https://img.shields.io/pypi/pyversions/underthesea.svg

https://img.shields.io/badge/license-GNU%20General%20Public%20License%20v3-brightgreen.svg

https://img.shields.io/travis/undertheseanlp/underthesea.svg

https://img.shields.io/badge/chat-on%20facebook-green.svg

underthesea is a suite of open source Python modules, data sets and tutorials supporting research and development in Vietnamese Natural Language Processing.

Free software: GNU General Public License v3
Documentation: https://underthesea.readthedocs.io
Live demo: undertheseanlp.com
Facebook Page: https://www.facebook.com/undertheseanlp/
Youtube: Underthesea NLP Channel

Installation

To install underthesea, simply:

$ pip install underthesea
✨🍰✨

Satisfaction, guaranteed.

1. Sentence Segmentation

https://img.shields.io/badge/F1-98%25-red.svg

https://img.shields.io/badge/✎-custom%20models-blue.svg

Usage

>>> # -*- coding: utf-8 -*-
>>> from underthesea import word_tokenize
>>> text = 'Taylor cho biết lúc đầu cô cảm thấy ngại với cô bạn thân Amanda nhưng rồi mọi thứ trôi qua nhanh chóng. Amanda cũng thoải mái với mối quan hệ này.'

>>> sent_tokenize(text)
[
    "Taylor cho biết lúc đầu cô cảm thấy ngại với cô bạn thân Amanda nhưng rồi mọi thứ trôi qua nhanh chóng.",
    "Amanda cũng thoải mái với mối quan hệ này."
]

2. Word Segmentation

https://img.shields.io/badge/F1-94%25-red.svg

Usage

>>> # -*- coding: utf-8 -*-
>>> from underthesea import word_tokenize
>>> sentence = 'Chàng trai 9X Quảng Trị khởi nghiệp từ nấm sò'

>>> word_tokenize(sentence)
['Chàng trai', '9X', 'Quảng Trị', 'khởi nghiệp', 'từ', 'nấm', 'sò']

>>> word_tokenize(sentence, format="text")
'Chàng_trai 9X Quảng_Trị khởi_nghiệp từ nấm sò'

3. POS Tagging

https://img.shields.io/badge/accuracy-92.3%25-red.svg

Usage

>>> # -*- coding: utf-8 -*-
>>> from underthesea import pos_tag
>>> pos_tag('Chợ thịt chó nổi tiếng ở Sài Gòn bị truy quét')
[('Chợ', 'N'),
 ('thịt', 'N'),
 ('chó', 'N'),
 ('nổi tiếng', 'A'),
 ('ở', 'E'),
 ('Sài Gòn', 'Np'),
 ('bị', 'V'),
 ('truy quét', 'V')]

4. Chunking

https://img.shields.io/badge/F1-77%25-red.svg

Usage

>>> # -*- coding: utf-8 -*-
>>> from underthesea import chunk
>>> text = 'Bác sĩ bây giờ có thể thản nhiên báo tin bệnh nhân bị ung thư?'
>>> chunk(text)
[('Bác sĩ', 'N', 'B-NP'),
 ('bây giờ', 'P', 'I-NP'),
 ('có thể', 'R', 'B-VP'),
 ('thản nhiên', 'V', 'I-VP'),
 ('báo tin', 'N', 'B-NP'),
 ('bệnh nhân', 'N', 'I-NP'),
 ('bị', 'V', 'B-VP'),
 ('ung thư', 'N', 'I-VP'),
 ('?', 'CH', 'O')]

5. Named Entity Recognition

https://img.shields.io/badge/F1-86.6%25-red.svg

Usage

>>> # -*- coding: utf-8 -*-
>>> from underthesea import ner
>>> text = 'Chưa tiết lộ lịch trình tới Việt Nam của Tổng thống Mỹ Donald Trump'
>>> ner(text)
[('Chưa', 'R', 'O', 'O'),
 ('tiết lộ', 'V', 'B-VP', 'O'),
 ('lịch trình', 'V', 'B-VP', 'O'),
 ('tới', 'E', 'B-PP', 'O'),
 ('Việt Nam', 'Np', 'B-NP', 'B-LOC'),
 ('của', 'E', 'B-PP', 'O'),
 ('Tổng thống', 'N', 'B-NP', 'O'),
 ('Mỹ', 'Np', 'B-NP', 'B-LOC'),
 ('Donald', 'Np', 'B-NP', 'B-PER'),
 ('Trump', 'Np', 'B-NP', 'I-PER')]

6. Text Classification

https://img.shields.io/badge/accuracy-86.7%25-red.svg

Install dependencies and download default model

$ pip install Cython
$ pip install joblib future scipy numpy scikit-learn
$ pip install -U fasttext --no-cache-dir --no-deps --force-reinstall
$ underthesea data

Usage

>>> # -*- coding: utf-8 -*-
>>> from underthesea import classify
>>> classify('HLV đầu tiên ở Premier League bị sa thải sau 4 vòng đấu')
['The thao']
>>> classify('Hội đồng tư vấn kinh doanh Asean vinh danh giải thưởng quốc tế')
['Kinh doanh']
>>> classify('Đánh giá “rạp hát tại gia” Samsung Soundbar Sound+ MS750')
['Vi tinh']

7. Sentiment Analysis

https://img.shields.io/badge/F1-59.5%25-red.svg

Install dependencies

$ pip install future scipy numpy scikit-learn==0.19.2 joblib

Usage

>>> # -*- coding: utf-8 -*-
>>> from underthesea import sentiment
>>> sentiment('Gọi mấy lần mà lúc nào cũng là các chuyên viên đang bận hết ạ', domain='bank')
('CUSTOMER SUPPORT#NEGATIVE',)
>>> sentiment('bidv cho vay hay ko phu thuoc y thich cua thang tham dinh, ko co quy dinh ro rang', domain='bank')
('LOAN#NEGATIVE',)

Up Coming Features

Text to Speech
Automatic Speech Recognition
Machine Translation
Dependency Parsing

Contributing

Do you want to contribute with underthesea development? Great! Please read more details at CONTRIBUTING.rst.

History

1.1.12 (2019-03-13)

Add sentence segmentation feature

1.1.9 (2019-01-01)

Improve speed of word_tokenize function
Only support python 3.6+
Use flake8 for style guide enforcement

1.1.8 (2018-06-20)

Fix word_tokenize error when text contains tab (t) character
Fix regex_tokenize with url

1.1.7 (2018-04-12)

Rename word_sent function to word_tokenize
Refactor version control in setup.py file and __init__.py file
Update documentation badge url

1.1.6 (2017-12-26)

New feature: aspect sentiment analysis
Integrate with languageflow 1.1.6
Fix bug tokenize string with ‘=’ (#159)

1.1.5 (2017-10-12)

New feature: named entity recognition
Refactor and update model for word_sent, pos_tag, chunking

1.1.4 (2017-09-12)

New feature: text classification
[bug] Fix Text error
[doc] Add facebook link

1.1.3 (2017-08-30)

Add live demo: https://underthesea.herokuapp.com/

1.1.2 (2017-08-22)

Add dictionary

1.1.1 (2017-07-05)

Support Python 3
Refactor feature_engineering code

1.1.0 (2017-05-30)

Add chunking feature
Add pos_tag feature
Add word_sent feature, fix performance
Add Corpus class
Add Transformer classes
Integrated with dictionary of Ho Ngoc Duc
Add travis-CI, auto build with PyPI

1.0.0 (2017-03-01)

First release on PyPI.
First release on Readthedocs

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Natural Language
- English
Programming Language
- Python :: 3.6

Release history Release notifications | RSS feed

9.2.11

Feb 7, 2026

9.2.10

Feb 7, 2026

9.2.9

Feb 3, 2026

9.2.8

Feb 2, 2026

9.2.7

Feb 2, 2026

9.2.6

Feb 2, 2026

9.2.5

Feb 2, 2026

9.2.4

Feb 2, 2026

9.2.3

Feb 2, 2026

9.2.2

Feb 2, 2026

9.2.1

Feb 1, 2026

9.2.0

Jan 31, 2026

9.1.5

Jan 29, 2026

9.1.4

Jan 24, 2026

9.1.3

Jan 24, 2026

9.1.2

Jan 24, 2026

9.1.1

Jan 24, 2026

9.1.0

Jan 24, 2026

9.0.0

Jan 23, 2026

8.3.0

Sep 28, 2025

8.3.0a2 pre-release

Sep 28, 2025

8.3.0a1 pre-release

Sep 27, 2025

8.3.0a0 pre-release

Sep 27, 2025

8.2.0

Sep 21, 2025

8.2.0a1 pre-release

Sep 21, 2025

8.2.0a0 pre-release

Sep 21, 2025

8.1.0

Sep 21, 2025

8.1.0a0 pre-release

Sep 21, 2025

8.0.1

Sep 21, 2025

8.0.1a1 pre-release

Sep 20, 2025

8.0.1a0 pre-release

Sep 20, 2025

8.0.0

Sep 20, 2025

8.0.0a1 pre-release

Sep 20, 2025

6.8.4

Jun 22, 2024

6.8.3

Jun 8, 2024

6.8.0

Sep 22, 2023

6.7.0

Jul 28, 2023

6.6.0

Jul 27, 2023

6.5.0

Jul 14, 2023

6.4.0

Jul 14, 2023

6.3.0

Jun 28, 2023

6.2.0

Mar 4, 2023

6.1.4

Feb 26, 2023

6.1.3

Feb 25, 2023

6.1.2

Feb 15, 2023

6.1.1

Feb 10, 2023

6.1.0

Feb 8, 2023

6.0.3

Jan 25, 2023

6.0.2

Jan 17, 2023

6.0.1

Jan 8, 2023

6.0.0

Jan 1, 2023

1.4.1

Dec 23, 2022

1.4.1a0 pre-release

Dec 16, 2022

1.4.0

Dec 11, 2022

1.4.0a3 pre-release

Nov 20, 2022

1.4.0a2 pre-release

Nov 11, 2022

1.4.0a1 pre-release

Nov 7, 2022

1.4.0a0 pre-release

Nov 3, 2022

1.3.5

Oct 31, 2022

1.3.5a3 pre-release

Aug 13, 2022

1.3.5a2 pre-release

Aug 12, 2022

1.3.5a1 pre-release

Aug 8, 2022

1.3.5a0 pre-release

Jun 28, 2022

1.3.4

Jan 8, 2022

1.3.4a2 pre-release

Nov 18, 2021

1.3.4a1 pre-release

Nov 17, 2021

1.3.4a0 pre-release

Nov 17, 2021

1.3.3

Sep 2, 2021

1.3.3a1 pre-release

Sep 2, 2021

1.3.3a0 pre-release

Aug 8, 2021

1.3.2

Aug 4, 2021

1.3.2a3 pre-release

Aug 4, 2021

1.3.2a2 pre-release

Aug 4, 2021

1.3.2a1 pre-release

Jan 18, 2021

1.3.2a0 pre-release

Jan 11, 2021

1.3.1

Jan 11, 2021

1.3.1a2 pre-release

Jan 7, 2021

1.3.1a1 pre-release

Dec 27, 2020

1.3.1a0 pre-release

Dec 25, 2020

1.3.0

Dec 11, 2020

1.3.0a2 pre-release

Dec 10, 2020

1.3.0a1 pre-release

Nov 29, 2020

1.3.0a0 pre-release

Nov 29, 2020

1.2.3

Nov 28, 2020

1.2.3a4 pre-release

Nov 18, 2020

1.2.3a3 pre-release

Nov 14, 2020

1.2.3a2 pre-release

Nov 9, 2020

1.2.3a1 pre-release

Nov 9, 2020

1.2.2

Nov 4, 2020

1.2.2a1 pre-release

Nov 1, 2020

1.2.2a0 pre-release

Oct 30, 2020

1.2.1

Oct 28, 2020

1.2.1a1 pre-release

Oct 28, 2020

1.2.1a0 pre-release

Oct 28, 2020

1.2.0

Oct 28, 2020

1.2.0a3 pre-release

Oct 28, 2020

1.2.0a2 pre-release

Oct 28, 2020

1.2.0a1 pre-release

Jul 3, 2020

1.2.0a0 pre-release

Jul 2, 2020

1.1.17

Aug 29, 2019

1.1.17a1 pre-release

Aug 29, 2019

1.1.17a0 pre-release

Aug 29, 2019

1.1.16

Jun 15, 2019

1.1.16a2 pre-release

Jun 15, 2019

1.1.16a1 pre-release

Jun 15, 2019

1.1.16a0 pre-release

Jun 15, 2019

1.1.15

Mar 13, 2019

1.1.14

Mar 13, 2019

1.1.13

Mar 13, 2019

This version

1.1.12

Mar 13, 2019

1.1.12a0 pre-release

Mar 13, 2019

1.1.11

Jan 13, 2019

1.1.10

Jan 13, 2019

1.1.9

Jan 1, 2019

1.1.9a6 pre-release

Oct 4, 2018

1.1.9a5 pre-release

Sep 17, 2018

1.1.9a4 pre-release

Sep 17, 2018

1.1.9a3 pre-release

Sep 17, 2018

1.1.9a2 pre-release

Aug 18, 2018

1.1.9a1 pre-release

Jul 24, 2018

1.1.9a0 pre-release

Jul 24, 2018

1.1.8

Jun 20, 2018

1.1.8a0 pre-release

May 6, 2018

1.1.7

Apr 29, 2018

1.1.7a2 pre-release

Apr 11, 2018

1.1.7a1 pre-release

Apr 11, 2018

1.1.7a0 pre-release

Apr 11, 2018

1.1.6

Jan 24, 2018

1.1.6rc2 pre-release

Dec 30, 2017

1.1.6rc0 pre-release

Dec 26, 2017

1.1.6a1 pre-release

Dec 30, 2017

1.1.6a0 pre-release

Dec 23, 2017

1.1.5

Oct 25, 2017

1.1.5rc1 pre-release

Oct 12, 2017

1.1.4rc2 pre-release

Sep 13, 2017

1.1.4rc1 pre-release

Sep 13, 2017

1.1.3

Aug 25, 2017

1.1.2

Aug 24, 2017

1.1.1

Jul 4, 2017

1.1.0

May 30, 2017

1.0.20

May 26, 2017

1.0.19

May 25, 2017

1.0.18

May 24, 2017

1.0.17

May 24, 2017

1.0.16

May 23, 2017

1.0.15

May 9, 2017

1.0.14

May 9, 2017

1.0.13

May 8, 2017

1.0.12

Apr 7, 2017

1.0.11

Mar 31, 2017

1.0.10

Mar 23, 2017

1.0.9

Mar 7, 2017

1.0.8

Mar 3, 2017

1.0.7

Mar 3, 2017

1.0.6

Mar 3, 2017

1.0.5

Mar 3, 2017

1.0.4

Mar 3, 2017

1.0.1

Mar 2, 2017

1.0.0

Mar 1, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

underthesea-1.1.12.tar.gz (11.9 MB view details)

Uploaded Mar 13, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

underthesea-1.1.12-py3-none-any.whl (11.3 MB view details)

Uploaded Mar 13, 2019 Python 3

File details

Details for the file underthesea-1.1.12.tar.gz.

File metadata

Download URL: underthesea-1.1.12.tar.gz
Upload date: Mar 13, 2019
Size: 11.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for underthesea-1.1.12.tar.gz
Algorithm	Hash digest
SHA256	`305cd758bcad1123c5c832302bd81a8fb712ecdf9b3299a916e7c96b93b571ff`
MD5	`700179b602e7acc54c6174f24d6ef06b`
BLAKE2b-256	`0a4e6846e5a70d06ed967fc0969c0add475f06e10f2ec6ac5a7983d5d687b7b4`

See more details on using hashes here.

File details

Details for the file underthesea-1.1.12-py3-none-any.whl.

File metadata

Download URL: underthesea-1.1.12-py3-none-any.whl
Upload date: Mar 13, 2019
Size: 11.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for underthesea-1.1.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`256aab7c86fe269db1a3e9be42ae92ac3e26546cdfda02e5c00478b49fc042c4`
MD5	`043a0d83a978af532b2a109e86eb044f`
BLAKE2b-256	`4ff0822b9a905ec1003092b100557790b1eedf882ea4fc25890fff073b2a0510`

See more details on using hashes here.

underthesea 1.1.12

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Underthesea - Vietnamese NLP Toolkit

Installation

Usage

1. Sentence Segmentation

2. Word Segmentation

3. POS Tagging

4. Chunking

5. Named Entity Recognition

6. Text Classification

7. Sentiment Analysis

Up Coming Features

Contributing

History

1.1.12 (2019-03-13)

1.1.9 (2019-01-01)

1.1.8 (2018-06-20)

1.1.7 (2018-04-12)

1.1.6 (2017-12-26)

1.1.5 (2017-10-12)

1.1.4 (2017-09-12)

1.1.3 (2017-08-30)

1.1.2 (2017-08-22)

1.1.1 (2017-07-05)

1.1.0 (2017-05-30)

1.0.0 (2017-03-01)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes