Skip to main content

Slim, flexible and extendable NLP engine that can produce list of features from text based on provided condtions.

Project description

prosecco

Description

Slim, flexible and extendable NLP engine that can produce list of features from text based on provided condtions.

Features

  • word categorisation
  • feature extraction

Install

pip install prosecco

Usage

python example.py
python example_basic.py

Examples

Basic

from prosecco import Prosecco, Condition

# Read wikipedia https://en.wikipedia.org/wiki/Superhero
with open('sample/superhero.txt') as f:
    text = f.read()

# 1. Create conditions based on super hero names
superheroes = ["batman", "spiderman", "superman", "captain marvel", "black panther"]
conditions = [Condition(lemma_type="hero", compare=hero, lower=True) for hero in superheroes]
# 2. Create prosecco
p = Prosecco(conditions=conditions)
# 3. Let's drink and print output
p.drink(text, progress=True)
lemmas = set(p.get_lemmas(type='hero'))
print(" ".join(map(str, lemmas)))

Output

Batman[hero] Black Panther[hero] Superman[hero] Captain Marvel[hero]

Advanced

from prosecco import *

text = """Chrząszcz brzmi w trzcinie w Szczebrzeszynie.
Ząb zupa zębowa, dąb zupa dębowa.
Gdzie Rzym, gdzie Krym. W Pacanowie kozy kują.
Tak, jeśli mam szczęśliwy być, to w Gdańsku muszę żyć! 
"""

# 1. Create conditions based on city names
cities = ["szczebrzeszyn", "pacanow", "gdansk", "rzym", "krym"]
conditions = []
for city in cities:
    conditions.append(Condition(lemma_type="city",
                                compare=city,
                                normalizer=CharsetNormalizer(Charset.PL_EN),
                                stemmer=WordStemmer(language="pl"),
                                lower=True))
# 2. Create tokenizer for polish charset
tokenizer = LanguageTokenizer(Charset.PL)
# 3. Get list of tokens
tokens = tokenizer.tokenize(text)
# 4. Create visitor with conditions provided in step 1
visitor = Visitor(conditions=conditions)
# 5. Parse tokens based on visitor conditions
lexer = Lexer(tokens=tokens, visitor=visitor)
# 6. Get list of lemmas
lemmas = lexer.lex()
# 7. filter found cities
found_cities = filter(lambda l: l.type == "city", lemmas)
# 8. print output
print(" ".join(map(str, found_cities)))

Output

Szczebrzeszynie[city] Rzym[city] Krym[city] Pacanowie[city] Gdańsku[city]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prosecco-0.0.3.tar.gz (2.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prosecco-0.0.3-py3-none-any.whl (3.0 kB view details)

Uploaded Python 3

File details

Details for the file prosecco-0.0.3.tar.gz.

File metadata

  • Download URL: prosecco-0.0.3.tar.gz
  • Upload date:
  • Size: 2.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.3

File hashes

Hashes for prosecco-0.0.3.tar.gz
Algorithm Hash digest
SHA256 751e639d6a195f8fb8f4bb3ad7545e16ff6d93ba33c72b5f7da9001de3ccd280
MD5 e5e191f86a42716d85b35e2bc1790728
BLAKE2b-256 42a59e56ac5aea984904f7aad98f31e30320456899f1fb5c16e466094ba846bb

See more details on using hashes here.

File details

Details for the file prosecco-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: prosecco-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 3.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.3

File hashes

Hashes for prosecco-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2704d8fd0615df17bbe942d163a205dabc57a12df4c61e2e973c0daf748a8919
MD5 23b205b67babd79ceb3c89fcba3f974c
BLAKE2b-256 f9fa24b3280a1fa752513e8e529ab2a9c3eb6f4fb7f556973fef7852ef7117f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page