Text Classifier, Text Classification

These details have not been verified by PyPI

Project links

Homepage

Project description

pytextclassifier

pytextclassifier, Python Text Classifier. It can be applied to the fields of sentiment polarity analysis, text risk classification and so on, and it supports multiple classification algorithms and clustering algorithms.

文本分类器，提供多种文本分类和聚类算法，支持文本极性情感分类，文本风险类型分类等文本分类和聚类应用。兼容python2.7和python3。

Guide

Feature
Install
Usage
Dataset
Contact
Cite
Reference

Feature

pytextclassifier is a python Open Source Toolkit for text classification. The goal is to implement text analysis algorithm, so as to achieve the use in the production environment.

pytextclassifier has the characteristics of clear algorithm, high performance and customizable corpus.

Functions：

Classifier

LogisticRegression
Random Forest
Decision Tree
K-Nearest Neighbours
Naive bayes
Xgboost
Support Vector Machine(SVM)
Xgboost
Xgboost_lr
MLP
Ensemble
Stack
TextCNN
TextRNN
Fasttext

Evaluate

Precision
Recall
F1

Test

Chi-square test

Cluster

MiniBatchKmeans

While providing rich functions, pytextclassifier internal modules adhere to low coupling, model adherence to inert loading, dictionary publication, and easy to use.

Install

Requirements and Installation

pip3 install pytextclassifier

git clone https://github.com/shibing624/pytextclassifier.git
cd pytextclassifier
python3 setup.py install

Usage

Text Classifier

English Text Classifier

Including model training, saving, predict, test, for example base_demo.py:

from pytextclassifier import TextClassifier

m = TextClassifier(model_name='lr')
# model_name is choose classifier, support lr, random_forest, xgboost, svm, mlp, ensemble, stack
data = [
    ('education', 'Student debt to cost Britain billions within decades'),
    ('education', 'Chinese education for TV experiment'),
    ('sports', 'Middle East and Asia boost investment in top level sports'),
    ('sports', 'Summit Series look launches HBO Canada sports doc series: Mudhar')
]
m.train(data)
r = m.predict(['Abbott government spends $8 million on higher education media blitz',
               'Middle East and Asia boost investment in top level sports'])
print(r)  # ['education' 'sports']
m.save()
del m

new_m = TextClassifier()
new_m.load()
predict_label = new_m.predict(['Abbott government spends $8 million on higher education media blitz'])
print(predict_label)  # ['education']

predict_label = new_m.predict(['Abbott government spends $8 million on higher education media blitz',
                               'Middle East and Asia boost investment in top level sports'])
print(predict_label)  # ['education', 'sports']

test_data = [
    ('education', 'Abbott government spends $8 million on higher education media blitz'),
    ('sports', 'Middle East and Asia boost investment in top level sports'),
]
acc_score = new_m.test(test_data)
print(acc_score)  # 1.0

output:

['education' 'sports']
save output/vectorizer.pkl ok.
save output/model.pkl ok.
['education']
['education' 'sports']
1.0

Chinese Text Classifier

Text classification compatible with Chinese and English corpora, for example chinese_text_demo.py

from pytextclassifier import TextClassifier

m = TextClassifier(model_name='lr')
# model_name 是选择分类器，支持lr, random_forest, xgboost, svm, mlp, ensemble, stack
data = [
    ('education', '名师指导托福语法技巧：名词的复数形式'),
    ('education', '中国高考成绩海外认可 是“狼来了”吗？'),
    ('sports', '图文：法网孟菲尔斯苦战进16强 孟菲尔斯怒吼'),
    ('sports', '四川丹棱举行全国长距登山挑战赛 近万人参与'),
    ('sports', '米兰客场8战不败国米10年连胜')
]
m.train(data)

r = m.predict(['福建春季公务员考试报名18日截止 2月6日考试',
               '意甲首轮补赛交战记录:米兰客场8战不败国米10年连胜'])
print(r)  # ['education' 'sports']
m.save()
del m

new_m = TextClassifier()
new_m.load()
predict_label = new_m.predict(['福建春季公务员考试报名18日截止 2月6日考试'])
print(predict_label)  # ['education']

predict_label = new_m.predict(['福建春季公务员考试报名18日截止 2月6日考试',
                               '意甲首轮补赛交战记录:米兰客场8战不败国米10年连胜'])
print(predict_label)  # ['education', 'sports']

test_data = [
    ('education', '福建春季公务员考试报名18日截止 2月6日考试'),
    ('sports', '意甲首轮补赛交战记录:米兰客场8战不败国米10年连胜'),
]
acc_score = new_m.test(test_data)
print(acc_score)  # 1.0

output:

['education' 'sports']
save vectorizer.pkl ok.
save model.pkl ok.
['education']
['education' 'sports']
1.0

Text Cluster

Text clustering, for example cluster_demo.py

import sys

sys.path.append('..')
from pytextclassifier.textcluster import TextCluster

m = TextCluster()
data = [
    'Student debt to cost Britain billions within decades',
    'Chinese education for TV experiment',
    'Abbott government spends $8 million on higher education',
    'Middle East and Asia boost investment in top level sports',
    'Summit Series look launches HBO Canada sports doc series: Mudhar'
]
model, X_vec, labels = m.train(data, n_clusters=2)
r = m.predict(['Abbott government spends $8 million on higher education media blitz',
               'Middle East and Asia boost investment in top level sports'])
print(r)
m.show_clusters(X_vec, labels, image_file='cluster.png')
m.save()
del m

new_m = TextCluster()
new_m.load()
r = new_m.predict(['Abbott government spends $8 million on higher education media blitz',
                   'Middle East and Asia boost investment in top level sports'])
print(r)

# load train data from file
tc = TextCluster()
data = tc.load_file_data('train_seg_sample.txt')
_, X_vec, labels = tc.train(data, n_clusters=3)
tc.show_clusters(X_vec, labels, 'cluster_train_seg_samples.png')
r = tc.predict(data[:5])
print(r)

output:

[1 1]
[1 1]
[2 2 2 2 1]

clustering plot image:

cluster_image

Train your Text Classification Deep Model

Preprocess with segment(optional)

cd pytextclassifier
python3 preprocess.py

Train model

you can change model with edit config.py and train model.

python3 train.py

Predict with test data

python3 infer.py

Contact

Issue(建议)：
邮件我：xuming: xuming624@qq.com
微信我：加我微信号：xuming624, 备注：个人名称-NLP 进NLP交流群。

Cite

如果你在研究中使用了pytextclassifier，请按如下格式引用：

@software{pytextclassifier,
  author = {Xu Ming},
  title = {pytextclassifier: A Tool for Text Classifier},
  year = {2021},
  url = {https://github.com/shibing624/pytextclassifier},
}

License

授权协议为 The Apache License 2.0，可免费用做商业用途。请在产品说明中附加pytextclassifier的链接和授权协议。

Contribute

项目代码还很粗糙，如果大家对代码有所改进，欢迎提交回本项目，在提交之前，注意以下两点：

在tests添加相应的单元测试
使用python setup.py test来运行所有单元测试，确保所有单测都是通过的

之后即可提交PR。

Reference

SentimentPolarityAnalysis

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.4.0

Jul 31, 2024

1.3.9

May 22, 2024

1.3.8

May 9, 2024

1.3.7

Oct 18, 2023

1.3.6

May 10, 2023

1.3.5

Apr 3, 2023

1.3.4

Jan 12, 2023

1.3.3

Dec 13, 2022

1.3.2

Oct 21, 2022

1.3.1

Sep 16, 2022

1.3.0

Sep 16, 2022

1.2.0

Apr 12, 2022

1.1.6

Mar 29, 2022

1.1.5

Mar 29, 2022

1.1.4

Feb 10, 2022

1.1.3

Oct 28, 2021

1.1.2

Oct 26, 2021

1.0.4

Oct 9, 2021

1.0.3

Oct 9, 2021

1.0.2

Oct 8, 2021

1.0.1

Oct 6, 2021

1.0.0

Oct 1, 2021

0.1.5

Sep 4, 2021

0.1.4

Aug 26, 2021

0.1.3

Aug 23, 2021

0.1.2

Aug 23, 2021

This version

0.1.1

Aug 23, 2021

0.0.3

Jun 17, 2021

0.0.2

Jun 16, 2021

0.0.1

Jun 16, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytextclassifier-0.1.1.tar.gz (34.0 kB view details)

Uploaded Aug 23, 2021 Source

File details

Details for the file pytextclassifier-0.1.1.tar.gz.

File metadata

Download URL: pytextclassifier-0.1.1.tar.gz
Upload date: Aug 23, 2021
Size: 34.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for pytextclassifier-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`3d9697048d8e316b8aaeabadfa430a1942c475fa5e65eea74af5618f7a2e5b7f`
MD5	`ba74b984c004bb60c8dc9b926f9956d6`
BLAKE2b-256	`a780ed462f68e5a686787cfc16136ef99fffe08829a4db38e04717277c377cb1`

See more details on using hashes here.

pytextclassifier 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pytextclassifier

Feature

Classifier

Evaluate

Test

Cluster

Install

Usage

Text Classifier

Text Cluster

Train your Text Classification Deep Model

Contact

Cite

License

Contribute

Reference

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes