A package to make NLP easy, fast, and fun

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Subtext 0.0.2

A package to make NLP fast and easy for beginners.

Efficient text prediction
Text pairing, equivalent to that of NLTK's n-gram.
Syllable Identification
Find frequencies of words in given text
Find matching words in two arrays

I still have a lot of plans for this package, for that reason, there would be a lot of frequent updates in the near future. The updates would include optimizations & more functions, so stay tuned.

Install

pip install subtext

Usage

First import the program using:

import subtext

Predict

A function that predicts the next x number of words based on the given string and phrase

Parameters

The function's parameters are:

subtext.predict(string, phrase, n=0, case_insensitive=False)

String: Main text
Phrase: The key phrase (prompt). The function would try to predict what would come after the given phrase.
n: The number of words it would return. It's automomatically set to 0, which would return all predictions regardless of their corresponding word counts.
case_insensitive: Set this to True if you want to.

Actual usage

So, let's try to use this.

string="I am a string. I am also a human being, but most importantly, I am a string."
print(predict(string, "I am", n=1))

This would output

{'a': 2, 'also': 1}

But, if you change the n value,

print(predict(string, "I am", n=2))

It would output

{'a string.': 2, 'also a': 1}

Pair

This function splits a string into pairs of strings.

Parameters

subtext.pair(string, n)

string is the string you're trying to split into pairs
n stands for the number of strings in each pair. (Equivalent to that of the n value in n-gram)

Usage

Let's set our string to:

string="Sometimes, I just go out and eat sand. I don't know why"

Don't ask. Let's turn this into pairs of 2:

print(pair(string, 2))

Which outputs

[['Sometimes,', 'I'], ['I', 'just'], ['just', 'go'], ['go', 'out'], ['out', 'and'], ['and', 'eat'], ['eat', 'sand.'], ['sand.', 'I'], ['I', "don't"], ["don't", 'know'], ['know', 'why']]

Identify Syllables

subtext.syllables("carbonmonoxide")

This outputs:

car-bon-mon-ox-ide

But take note that this only works with lowercase strings.

Countwords

Parameters

The function's parameters are:

subtext.countwords(string, case_insensitive=False)

Change that to True if you want it to be case-insensitive.

Actual usage

Get yourself a nice string

string = "Sometimes I wonder, 'Am I stupid?' then I realize, yeah. yeah, I am stupid."

Then put it in the function:

x = subtext.countwords(string)
print(x)

It should print:

{'I': 4, 'Sometimes': 1, 'wonder,': 1, "'Am": 1, "stupid?'": 1, 'then': 1, 'realize,': 1, 'yeah.': 1, 'yeah,': 1, 'am': 1, 'stupid.': 1}

Matchingwords

A function that finds & counts matching words in two strings

Actual usage

So in this case, our strings are:

string1, string2 = "God, I love drawing, drawing is my favourite thing to do", "God, I hate drawing, drawing is my least favourite thing to do"

If we run this through matchingwords, we would get:

{'God,': 1, 'I': 1, 'drawing,': 1, 'drawing': 1, 'is': 1, 'my': 1, 'favourite': 1, 'thing': 1, 'to': 1, 'do': 1}

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.0.3

Mar 27, 2023

2.0.2

Mar 27, 2023

2.0.1

Mar 27, 2023

2.0

Mar 27, 2023

This version

0.0.2

Jan 21, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Subtext-0.0.2.tar.gz (4.8 kB view hashes)

Uploaded Jan 21, 2022 Source

Built Distribution

Subtext-0.0.2-py3-none-any.whl (4.8 kB view hashes)

Uploaded Jan 21, 2022 Python 3

Hashes for Subtext-0.0.2.tar.gz

Hashes for Subtext-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`e947c5263fba60220e31102ecea65b81f48c5f6946b444d99f67909d79334a75`
MD5	`0bc262f256cd0f58170ab66af7d50139`
BLAKE2b-256	`ce8401c064e6433f8c630d2e65db5e485e31a49045c36d028881acd316888b05`

Hashes for Subtext-0.0.2-py3-none-any.whl

Hashes for Subtext-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`013d39a660374b443940c2778d7e3cb44e1489858ee2b9493f7f6368ffef480f`
MD5	`ebe6d41ac659c4ba71d99d9de9e1ceaf`
BLAKE2b-256	`22107875b194fc491ec06d02e276062e61955c4135e7698467d516e382e93247`