fastent

Automated Custom NER tool

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

# fastent

Fastent is a tool designed for creating end to end Custom Named Entity Recognition models. Entities **ARE NOT** limited to the usual predefiend classes of Person(PER), Location(LOC), Companies/agencies/institutions(ORG) etc etc. Any custom entity that can be described using a list of words can be created.

The package is comprised of several modules that can be used both sperately for their designated tasks (i.e Anotation, contextualization, etc etc.) or in a combined workflow. Most of the modules offer multilingual support, meaning the datasets and text don't necessarily require English language.

Table of contents
=================


* [Installation](#installation)
* [Usage](#usage)
* [Dataset generation](#Dataset-Generation)
* [Contextualization](#Contextualization)
* [Api for model download](#Api)
* [Annotation](#Annotation)
* [Text utilities](#Text-utilities)
* [wordnet utilities](#Wordnet)
* [Poincare embeddings wrapper](#Poincare)
* [Combinging everyting](#combo)
* [Baselines](#tests)
* [Dependency](#dependency)


Installation
============

This section show the process for installing the package with different methods

### From source

1) lets start by cloning the package

```
git clone https://github.com/fastent/fastent.git
```
2) Installing all the relevant packages

```
pip install -r requirements.txt
```

3) Install couchDB

Update the current packages
```
sudo apt-get update
```

Adding PPA Repository
```
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:couchdb/stable
sudo apt-get update
```

Installing CouchDB
```
sudo apt-get install couchdb
```

Ownership changes (recommended to fix the permission)

```
sudo chown -R couchdb:couchdb /usr/bin/couchdb /etc/couchdb /usr/share/couchdb
```

Once this is completed we need to fix the permissions

```
sudo chmod -R 0770 /usr/bin/couchdb /etc/couchdb /usr/share/couchdb
```

Restarting CouchDB

```
sudo systemctl restart couchdb
```

couchDB can now be accessed from http://127.0.0.1:5984/_utils/

4) Now you need to install NLTK dependencies.

```
>>> import nltk
>>> nltk.download()
```
The minimum installation requires to download the *stopwords* corpora. (Feel free to add more if you feel so)

### From pip

Coming Soon

Usage
======

## Dataset generation

The module includes a possibility to generate a dataset for raw entity words.
Example command looks as this if using source

```
python dataset_pseudo_generator.py -m en_core_web_lg -s cocaine,heroin
```

If using the package is installed
```
from fastent import dataset_pseudo_generator

model = dataset_pseudo_generator.spacy_initialize(model_name)
dataset_pseudo_generator.dataset_generate(model,['cocaine', 'heroin'], 100)

```

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.7.3

May 23, 2018

0.7.2

May 23, 2018

0.7.1

May 23, 2018

0.7

May 23, 2018

0.6

May 23, 2018

0.5.8.9

May 23, 2018

0.5.8.8

May 22, 2018

0.5.8.7

May 22, 2018

0.5.8.6

May 22, 2018

0.5.8.5

May 22, 2018

0.5.8.4

May 22, 2018

0.5.8.3

May 22, 2018

0.5.8.2

May 22, 2018

0.5.7

May 22, 2018

0.5.6

May 22, 2018

0.5.5

May 22, 2018

0.5.4

May 22, 2018

0.5.3

May 22, 2018

0.5.2

May 22, 2018

This version

0.5.1

May 22, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastent-0.5.1.tar.gz (11.2 kB view hashes)

Uploaded May 22, 2018 Source

Hashes for fastent-0.5.1.tar.gz

Hashes for fastent-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`bd856cb67cf5421697fc76be4f8c34ba56879ee058d5def849726dc0e0d660c4`
MD5	`d849105ef860f3f16ac46200aeab11b3`
BLAKE2b-256	`82b3ea000442743e0d1925a4e39799befaf492cfb55c588261b769da53c498de`