Skip to main content

Library for semantic annotation of tabular data with the Wikidata knowledge graph

Project description

bbw

  • Annotates tabular data with the entities, types and properties in Wikidata.
  • Easy to use: bbw.annotate().
  • Resolves even tricky spelling mistakes via meta-lookup through SearX.
  • Matches to the up-to-date values in Wikidata without the dump files.
  • Ranked in third place at SemTab2020.

Table of contents

How to use

Import library

from bbw import bbw

The easiest way to annotate the dataframe Y is:

[web_table, url_table, label_table, cpa, cea, cta] = bbw.annotate(Y)

It returns a list of six dataframes. The first three dataframes contain annotations in the form of HTML-links, URLs and labels of the entities in Wikidata correspondingly. The dataframes have two more rows than Y. These two rows contain annotations for types and properties. The last three dataframes contain the annotations in the format required by SemTab2020 challenge.

GUI

If you need to annotate only one table, use the simple GUI:

streamlit run bbw_gui.py

Open the browser at http://localhost:8501 and choose a CSV-file. The annotation process starts automatically. It outputs the six tables of the annotate function.

CLI

If you need to annotate a few tables, use the CLI-tool:

python3 bbw_cli.py --amount 100 --offset 0

GNU parallel

If you need to annotate hundreds or thousands of tables, use the script with GNU parallel:

./bbw_parallel.py

Installation

You can use pip to install bbw:

pip install bbw

Install also SearX, because bbw meta-lookups through it.

export PORT=80
docker pull searx/searx
docker run --rm -d -v ${PWD}/searx:/etc/searx -p $PORT:8080 -e BASE_URL=http://localhost:$PORT/ searx/searx

SearX is running on http://localhost:80. bbw sends GET requests to it.

Citing

If you find bbw useful in your work, a proper reference would be:

@inproceedings{2020_bbw,
  author    = {Renat Shigapov and Philipp Zumstein and Jan Kamlah and Lars Oberl{\"a}nder and J{\"o}rg Mechnich and Irene Schumm},
  title     = {bbw: {M}atching {CSV} to {W}ikidata via {M}eta-lookup},
  booktitle = {SemTab@ISWC 2020},
  year = {2020}
}

SemTab2020

The library was designed, implemented and tested during SemTab2020. It received the best scores in the last 4th round at automatically generated dataset:

Task F1-score Precision Rank
CPA 0.995 0.996 2
CTA 0.980 0.980 2
CEA 0.978 0.984 4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bbw-0.1.0.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bbw-0.1.0-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file bbw-0.1.0.tar.gz.

File metadata

  • Download URL: bbw-0.1.0.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for bbw-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1542a2a57a6b782c96d0c1013dbdbe0ac3dbf5d06943bb8a331f6357e3107b73
MD5 4334b623a070d4bc318421f9b44b2d5f
BLAKE2b-256 75140e842b31f1467b75242b723e0098166c3050b4355cc2b0304c25cdec2020

See more details on using hashes here.

File details

Details for the file bbw-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bbw-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for bbw-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b0f0680b0eaf4e359e81043cefdb3ed04742a50098326c901ccacf9c27345bb0
MD5 19d27e7ea8f7321d46ce3aaf8234c98e
BLAKE2b-256 f6e58e97ad5068c634ba0189554ed3023ceb887628573fb502ebbf4ba4a69c2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page