Skip to main content

BioProv - Provenance capture for bioinformatics workflows

Project description

BioProv - W3C-PROV provenance documents for bioinformatics

Package License PyPI Version Requirements Status
Tests Build Status tests Coverage Status
Code Code style lint
Docs Docs status binder

BioProv is a Python library for W3C-PROV representation of biological data. It enables you to quickly write workflows and to describe relationships between samples, files, users and processes.

Please see the tutorials for a more detailed introduction.

>>> import bioprov as bp

# Create samples and file objects
>>> sample = bp.Sample("mysample")
>>> genome = bp.SequenceFile("mysample.fasta", "genome")
>>> sample.add_files(genome)

# Create programs
>>> output = sample.files["blast_out"] = bp.File("mysample.blast.tsv", "blast_out")
>>> blast = bp.Program("blastn", params={"-query": sample.files["genome"], "-db": "mydb.fasta", "-out": output})

# Run programs
>>> blast.run(sample=sample)  # Or sample.run(program=blast)

BioProv also has a command-line application to run preset workflows.

$ bioprov -h
usage: bioprov [-h] {genome_annotation,kaiju} ...

BioProv command-line application. Choose a workflow to begin.

optional arguments:
  -h, --help            show this help message and exit

workflows:
  {genome_annotation,kaiju}

BioProv is built with the Biopython and Pandas libraries.

You can import data into BioProv using Pandas objects.

# Read csv straight into BioProv
>>> samples = bp.read_csv("my_dataframe.tsv", sep="\t", sequencefile_cols="assembly")

# Alternatively, use a pandas DataFrame
>>> df = pd.read_csv("my_dataframe.tsv", sep="\t")

# [...] manipulate your df
>>> df["assembly"] = "assembly_directory/" + df["assembly"]

# Now load from your df
>>> samples = bp.from_df(df, sequencefile_cols="assembly", source_file="my_dataframe.tsv")

# `samples` becomes a Project dict-like object
>>> sample1 = samples['sample1']

BioProv 'SequenceFile' objects contains records formatted as Biopython SeqRecords:

>>> type(sample1)
Bio.SeqRecord.SeqRecord

BioProv objects can be imported or exported as JSON objects.

>>> sample1.to_json(), samples.to_json()

Installation

# Install from pip
$ pip install bioprov

# Or install from source
$ git clone https://github.com/vinisalazar/bioprov  # download
$ cd bioprov; pip install .                         # install
$ pytest                                            # test

Important! BioProv requires Prodigal to be tested. Otherwise tests will fail.

Contributions are welcome!

BioProv is in active development and no warranties are provided (please see the License).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioprov-0.1.13.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bioprov-0.1.13-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file bioprov-0.1.13.tar.gz.

File metadata

  • Download URL: bioprov-0.1.13.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.1 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.6

File hashes

Hashes for bioprov-0.1.13.tar.gz
Algorithm Hash digest
SHA256 c0e8c6d49d41effdc3327dc928002240b78ee4b437f419119ba7f6ad84ab9408
MD5 b7e23196834ba184b39c6cc11441ff97
BLAKE2b-256 b46e6e6e4b4887e266e24e248cbe2aa578c4482915d56f9df958d62b0f975e31

See more details on using hashes here.

File details

Details for the file bioprov-0.1.13-py3-none-any.whl.

File metadata

  • Download URL: bioprov-0.1.13-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.1 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.6

File hashes

Hashes for bioprov-0.1.13-py3-none-any.whl
Algorithm Hash digest
SHA256 e57ca5cc2d4f45ca792e63c5b2a2e2f2a62268bc13bd29afcb1b3542a1ed18c5
MD5 9c74c78b8634a5814099dea5c1c79bfa
BLAKE2b-256 f8b3d4842a747b7b7cfb854cdd1ed41338454d0b6bdb537761d7f04a83099e24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page