Skip to main content

Tokenize an English sentence to phrases

Project description

Phrase Tokenizer

pytestpythonCodacy BadgeCode style: blackdocstyle: googleLicense: MIT PyPI version

Tokenize an English sentence to phrases via benepar.

Installation

pip install phrase-tokenizer
# pip install phrase-tokenizer -U to update
# or to install the latest from github:
# pip git+https://github.com/ffreemt/phrase-tokenizer.git

Or clone the repo https://github.com/ffreemt/phrase-tokenizer.git:

git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install logzero benepar tensorflow

Or use poetry, e.g.

git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
poetry install

Usage

from phrase_tokenizer import phrase_tok

res = phrase_tok("Short cuts make long delays.")
print(res)
# ['Short cuts', 'make long delays']

# verbose=True turns on verbose to see the tokenizing process
res = phrase_tok("Short cuts make long delays", verbose=True)
# ',..Short.cuts,.make..long.delays..'

Consult the source code for details.

For Developers

git clone https://github.com/ffreemt/phrase-tokenizer.git
cd phrase-tokenizer
pip install -r requirements-dev.txt

In ipython, plot_tree is able to draw a nice tree to aid the development, e.g.,

from phrase_tokenizer.phrase_tok import plot_tree

plot_tree("Short cuts make long delays.")

img

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phrase-tokenizer-0.1.3.tar.gz (4.8 kB view hashes)

Uploaded Source

Built Distribution

phrase_tokenizer-0.1.3-py3-none-any.whl (4.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page