Skip to main content

Interface to WormBase (www.wormbase.org) curation data, including literature management and NLP functions

Project description

WBtools

Interface to WormBase curation database and Text Mining functions

Access WormBase paper corpus information by loading pdf files (converted to txt) and curation info from the WormBase database. The package also exposes text mining functions on papers' fulltext.

Installation

pip install wbtools

Usage example

Get sentences from a WormBase paper

from wbtools.literature.corpus import CorpusManager

paper_id = "00050564"
cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
                         paper_ids=[paper_id], ssh_host="ssh_host", ssh_user="ssh_user", ssh_passwd="ssh_passwd")
sentences = cm.get_paper(paper_id).get_text_docs(split_sentences=True)

Get the latest papers (up to 50) added to WormBase or modified in the last month

from wbtools.literature.corpus import CorpusManager
import datetime

cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
                         from_date=datetime.datetime.now(), max_num_papers=50, ssh_host="ssh_host", ssh_user="ssh_user", 
                         ssh_passwd="ssh_passwd")
paper_ids = [paper.paper_id for paper in cm.get_all_papers()]

Get the latest 50 papers added to WormBase or modified that have a final pdf version and have been flagged by WB paper classification pipeline, excluding reviews and papers with temp files only (proofs)

from wbtools.literature.corpus import CorpusManager
import datetime

cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
                         from_date=datetime.datetime.now(), max_num_papers=50, must_be_autclass_flagged=True,
                         exclude_pap_types=['Review'], exclude_temp_pdf=True, ssh_host="ssh_host", ssh_user="ssh_user", 
                         ssh_passwd="ssh_passwd")
paper_ids = [paper.paper_id for paper in cm.get_all_papers()]

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wbtools-1.1.1.tar.gz (38.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wbtools-1.1.1-py3-none-any.whl (51.2 kB view details)

Uploaded Python 3

File details

Details for the file wbtools-1.1.1.tar.gz.

File metadata

  • Download URL: wbtools-1.1.1.tar.gz
  • Upload date:
  • Size: 38.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.21.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.10

File hashes

Hashes for wbtools-1.1.1.tar.gz
Algorithm Hash digest
SHA256 6fd5161b548ad094fa22058e07bb913392de773ff628a04c56551a0352622d41
MD5 dcdba260444628ada7f7cf4a2c78b427
BLAKE2b-256 7d88d6fcb6003723afc4bdd784817387750e1859bb492becefb9cab50db23182

See more details on using hashes here.

File details

Details for the file wbtools-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: wbtools-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 51.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.21.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.10

File hashes

Hashes for wbtools-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 55feac232c5802f85d8f6417dcf8fa98b7e235f5be8212e36b2ad8254c299af8
MD5 cbcaf5f899256bf92d64e355a52e9b14
BLAKE2b-256 67863aa46dee88b2c45e204fca66de72b932e2065a8a3407bec35714fd2dbc02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page