Skip to main content

BIO and BEISO evaluation library

Project description

bioeval

CircleCI License: MIT

CoNLL-2000 style evaluation of data using BIO and BEISO representation for mutli-token entities (i.e. chunks).

Install

In the root folder execute:

pip install bioeval

Change Log

  • pypi release and automated CI releases
  • bioeval now supports pandas DataFame objects through bioeval.evaluate_df.

Usage

The library supports two ways of evaluating span annotation. The first is the native format way while the second uses a pandas DataFrame format.

Native input format

The native input format is a set of tuples, where each tuple signifies the group of tokens in a span. Tokens are also denoted by tuples that are supposed to be unique. The user can achieve that uniqueness through adding a unique identifier to each token as in the example bellow.

from bioeval import evaluate


# gold chunks
chunk = {
    ((1, 'Gold', 'N', 'B-NP'),),
    ((2, 'is', 'V', 'B-MV'),),
    ((3, 'green', 'J', 'B-AP'),),
    ((4, '.', '.', 'B-NP'),),
    (
        (5, 'The', 'D', 'B-NP'),
        (6, 'red', 'J', 'I-NP'),
        (7, 'square', 'N', 'I-NP')
    ),
    ((8, 'is', 'V', 'B-MV'),),
    (
        (9, 'very', 'A', 'B-AP'),
        (10, 'boring', 'J', 'I-AP')
    ),
    ((11, '.', '.', 'O'),)
}

# candidate chunks
guess_chunk = {
    ((1, 'Gold', 'N', 'B-NP'),),
    ((2, 'is', 'V', 'I-NP'),),
    ((3, 'green', 'J', 'B-AP'),),
    ((4, '.', '.', 'B-NP'),),
    (
        (5, 'The', 'D', 'B-NP'),
        (6, 'red', 'J', 'I-NP')
    ),
    ((7, 'square', 'N', 'O'),),
    ((8, 'is', 'V', 'B-MV'),),
    (
        (9, 'very', 'A', 'B-AP'),
        (10, 'boring', 'J', 'I-AP')
    ),
    ((8, '.', '.', 'O'),)
}

# evaluation
f1, pr, re = evaluate(gold_sequence=chunk, guess_sequence=guess_chunk, chunk_col=3)
print(f1)
# 71.43

Dataframe format

The library supports dataframes input through the use of the evaluate_df method, which needs the additional chunkcol and guesscol parameters to specify the gold and candidate spans.

import pandas as pd
from bioeval import evaluate_df

# input data as a JSON parsed to a DataFrame object
df = pd.DataFrame(
    [
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'I-foo','guesstag': 'I-foo'},
        {'chunktag': 'O','guesstag': 'O'},
        {'chunktag': 'B-bar','guesstag': 'B-bar'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'O','guesstag': 'O'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'I-foo','guesstag': 'I-foo'},
        {'chunktag': 'B-bar','guesstag': 'B-bar'},
        {'chunktag': 'I-bar','guesstag': 'I-bar'},
        {'chunktag': 'O','guesstag': 'O'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'B-bar','guesstag': 'I-foo'},
        {'chunktag': 'B-foo','guesstag': 'B-foo'},
        {'chunktag': 'I-foo','guesstag': 'B-foo'}
    ]
)

f1, pr, re = evaluate_df(df=df, chunkcol='chunktag', guesscol='guesstag')

print(f1)
>>> 62.5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioeval-1.1.2.dev0.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bioeval-1.1.2.dev0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file bioeval-1.1.2.dev0.tar.gz.

File metadata

  • Download URL: bioeval-1.1.2.dev0.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for bioeval-1.1.2.dev0.tar.gz
Algorithm Hash digest
SHA256 e9ad5e86f4a53ab81df92156f67833225bf95e97c7d6a9628f8e1e7faca5dce2
MD5 db8de9ef826c82ddfeaf1ed11786ddcf
BLAKE2b-256 665a8a63e17ce580d5b1b379677aca6b4aff63e3e30d422ef1c4e5a97c7bb44a

See more details on using hashes here.

File details

Details for the file bioeval-1.1.2.dev0-py3-none-any.whl.

File metadata

  • Download URL: bioeval-1.1.2.dev0-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for bioeval-1.1.2.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 1b9cf071e4f529be885b3809f2fdc298cc9d9f787a9382ec9628649751fa44b2
MD5 694b8fcb019fa2b77af4142d66638b7e
BLAKE2b-256 0208a28c4d948a096714a0ee4e85ceadc825459a40271b0ce97920004cf7e4a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page