BioProv - Provenance capture for bioinformatics workflows
Project description
BioProv - W3C-PROV provenance documents for bioinformatics
| Package | |||
|---|---|---|---|
| Tests | |||
| Code | |||
| Docs |
BioProv is a Python library for W3C-PROV representation of biological data. It enables you to quickly write workflows and to describe relationships between samples, files, users and processes.
Please see the tutorials for a more detailed introduction.
>>> import bioprov as bp
# Create samples and file objects
>>> sample = bp.Sample("mysample")
>>> genome = bp.SequenceFile("mysample.fasta", "genome")
>>> sample.add_files(genome)
# Create programs
>>> output = sample.files["blast_out"] = bp.File("mysample.blast.tsv", "blast_out")
>>> blast = bp.Program("blastn", params={"-query": sample.files["genome"], "-db": "mydb.fasta", "-out": output})
# Run programs
>>> blast.run(sample=sample) # Or sample.run(program=blast)
BioProv also has a command-line application to run preset workflows.
$ bioprov -h
usage: bioprov [-h] {genome_annotation,kaiju} ...
BioProv command-line application. Choose a workflow to begin.
optional arguments:
-h, --help show this help message and exit
workflows:
{genome_annotation,kaiju}
BioProv is built with the Biopython and Pandas libraries.
You can import data into BioProv using Pandas objects.
# Read csv straight into BioProv
>>> samples = bp.read_csv("my_dataframe.tsv", sep="\t", sequencefile_cols="assembly")
# Alternatively, use a pandas DataFrame
>>> df = pd.read_csv("my_dataframe.tsv", sep="\t")
# [...] manipulate your df
>>> df["assembly"] = "assembly_directory/" + df["assembly"]
# Now load from your df
>>> samples = bp.from_df(df, sequencefile_cols="assembly", source_file="my_dataframe.tsv")
# `samples` becomes a Project dict-like object
>>> sample1 = samples['sample1']
BioProv 'SequenceFile' objects contains records formatted as Biopython SeqRecords:
>>> type(sample1)
Bio.SeqRecord.SeqRecord
BioProv objects can be imported or exported as JSON objects.
>>> sample1.to_json(), samples.to_json()
Installation
# Install from pip
$ pip install bioprov
# Or install from source
$ git clone https://github.com/vinisalazar/bioprov # download
$ cd bioprov; pip install . # install
$ pytest # test
Important! BioProv requires Prodigal to be tested. Otherwise tests will fail.
Contributions are welcome!
BioProv is in active development and no warranties are provided (please see the License).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bioprov-0.1.13.tar.gz.
File metadata
- Download URL: bioprov-0.1.13.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.1 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0e8c6d49d41effdc3327dc928002240b78ee4b437f419119ba7f6ad84ab9408
|
|
| MD5 |
b7e23196834ba184b39c6cc11441ff97
|
|
| BLAKE2b-256 |
b46e6e6e4b4887e266e24e248cbe2aa578c4482915d56f9df958d62b0f975e31
|
File details
Details for the file bioprov-0.1.13-py3-none-any.whl.
File metadata
- Download URL: bioprov-0.1.13-py3-none-any.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.1 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e57ca5cc2d4f45ca792e63c5b2a2e2f2a62268bc13bd29afcb1b3542a1ed18c5
|
|
| MD5 |
9c74c78b8634a5814099dea5c1c79bfa
|
|
| BLAKE2b-256 |
f8b3d4842a747b7b7cfb854cdd1ed41338454d0b6bdb537761d7f04a83099e24
|