A bioinformatics pipeline for analysing short read Illumina data microbiological public health.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3.6
- Python :: 3.7
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

Bohra

A pipeline for analysis of Illumina short reads for public health microbiology.

Motivation

Bohra was inspired by Nullarbor (https://github.com/tseemann/nullarbor) to be used in public health microbiology labs for analysis of short reads from microbiological samples.

Limitations

Bohra is restricted to Illumina read sets. It has been built with the goal of being able to be run in HPC environments, although the configurations at this initial committ have not been included.

Pipeline

Bohra can be run in three modes

SNPs and Phylogeny

Clean reads
Call variants
Generate a phylogenetic tree

SNPs, Phylogeny, Typing, Annotation and Species Identification (DEFAULT)

Clean reads
Call variants
Generate a phylogenetic tree
Assemble
Species identification
MLST
Resistome
Annotate

SNPs, Phylogeny, PanGenome and Typing and Species Identification

Clean reads
Call variants
Generate a phylogenetic tree
Assemble
Species identification
MLST
Resistome
Annotate
Pan Genome

Installation

Dependencies

Bohra requires >=python3.6 and is dependent on snakemake

At the moment bohra can only be installed via github - other options will follow

pip3 install snakemake
pip3 install git+https://github.com/MDU-PHL/bohra

If you are installing on a server in your local directory use

pip3 install git+https://github.com/MDU-PHL/bohra --user

Don't forget to add your local installation to your path. For example this should work.

export PATH=~/.local/bin:=$PATH

Initial run

usage: bohra run [-h] [--input_file INPUT_FILE] [--job_id JOB_ID]
                    [--reference REFERENCE] [--mask MASK]
                    [--pipeline {sa,s,a,all}]
                    [--assembler {shovill,skesa,spades}] [--cpus CPUS]
                    [--minaln MINALN] [--prefillpath PREFILLPATH] [--mdu MDU]
                    [--workdir WORKDIR] [--resources RESOURCES] [--force]
                    [--dryrun] [--gubbins]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE, -i INPUT_FILE
                        Input file = tab-delimited with 3 columns
                        <isolatename> <path_to_read1> <path_to_read2>
                        (default: )
  --job_id JOB_ID, -j JOB_ID
                        Job ID, will be the name of the output directory
                        (default: )
  --reference REFERENCE, -r REFERENCE
                        Path to reference (.gbk or .fa) (default: )
  --mask MASK, -m MASK  Path to mask file if used (.bed) (default: False)
  --pipeline {sa,s,a,all}, -p {sa,s,a,all}
                        The pipeline to run. SNPS ('s') will call SNPs and
                        generate phylogeny, ASSEMBLIES ('a') will generate
                        assemblies and perform mlst and species identification
                        using kraken2, SNPs and ASSEMBLIES ('sa' - default)
                        will perform SNPs and ASSEMBLIES. ALL ('all') will
                        perform SNPS, ASSEMBLIES and ROARY for pan-genome
                        analysis (default: sa)
  --assembler {shovill,skesa,spades}, -a {shovill,skesa,spades}
                        Assembler to use. (default: skesa)
  --cpus CPUS, -c CPUS  Number of CPU cores to run, will define how many rules
                        are run at a time (default: 36)
  --minaln MINALN, -ma MINALN
                        Minimum percent alignment (default: 0)
  --prefillpath PREFILLPATH, -pf PREFILLPATH
                        Path to existing assemblies - in the form
                        path_to_somewhere/isolatename/contigs.fa (default:
                        None)
  --mdu MDU             If running on MDU data (default: True)
  --workdir WORKDIR, -w WORKDIR
                        Working directory, default is current directory
                        (default: /home/khhor/dev/bohra)
  --resources RESOURCES, -s RESOURCES
                        Directory where templates are stored (default:
                        templates)
  --force, -f           Add if you would like to force a complete restart of
                        the pipeline. All previous logs will be lost.
                        (default: False)
  --dryrun, -n          If you would like to see a dry run of commands to be
                        executed. (default: False)
  --gubbins, -g         If you would like to run gubbins. NOT IN USE YET -
                        PLEASE DO NOT USE (default: False)

Minimal run

bohra run -r path_to_ref -i path_to_input -j job_id

Subsequent run

Once a run has been completed you can rerun bohra

Add or remove isolates

Add - add a new tab-delimited line
Remove - Prepend a # to the lines you wish to remove

Change the reference

If changing the reference re-alignment and variant calling will be performed

Change the mask file


usage: bohra rerun [-h] [--reference REFERENCE] [--mask MASK] [--cpus CPUS]
                      [--workdir WORKDIR] [--resources RESOURCES] [--dryrun]
                      [--gubbins] [--keep]

optional arguments:
  -h, --help            show this help message and exit
  --reference REFERENCE, -r REFERENCE
                        Path to reference (.gbk or .fa) (default: )
  --mask MASK, -m MASK  Path to mask file if used (.bed) (default: )
  --cpus CPUS, -c CPUS  Number of CPU cores to run, will define how many rules
                        are run at a time (default: 36)
  --workdir WORKDIR, -w WORKDIR
                        Working directory, default is current directory
                        (default: /home/khhor/dev/bohra)
  --resources RESOURCES, -s RESOURCES
                        Directory where templates are stored (default:
                        templates)
  --dryrun, -n          If you would like to see a dry run of commands to be
                        executed. (default: False)
  --gubbins, -g         If you would like to run gubbins. NOT IN USE YET -
                        PLEASE DO NOT USE (default: False)
  --keep, -k            Keep report from previous run (default: False)

Rerun with different combination of isolates

bohra rerun

Rerun with different reference/mask

bohra rerun -r pathtonewref -m pathtonewmask

Output

Bohra outputs a directory with a report.html and all data required for visualisation in a web browser.

Etymology

Bohra is an exinct tree dwelling kangaroo whose fossils have been found in the Nullarbor, before the Nullarbor was treeless. Since this pipeline implements Snippy, named for another famous Australian kangaroo ('Skippy') and designed based on Nullarbor Bohra is an exceedingly appropriate name.

More to follow!!

Expand readme
Polish log files
Add clean functions

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3.6
- Python :: 3.7
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

2.3.6

Nov 14, 2023

2.3.5

Nov 14, 2023

2.3.4

Nov 13, 2023

2.3.3

Nov 11, 2023

2.3.2

Jun 6, 2023

2.3.1

Jun 5, 2023

2.3.0

Jun 2, 2023

2.2.1

May 1, 2023

2.2.0

Apr 24, 2023

1.2.20

Mar 9, 2021

1.2.19

Feb 27, 2021

1.2.18

Feb 23, 2021

1.2.16

Feb 19, 2021

1.2.15

Feb 13, 2021

1.2.14

Jun 11, 2020

1.2.13

Jun 11, 2020

1.2.12

Apr 7, 2020

1.2.11

Apr 1, 2020

1.2.10

Mar 16, 2020

1.2.9

Mar 13, 2020

1.2.8

Mar 13, 2020

1.2.7

Mar 12, 2020

1.2.6

Mar 12, 2020

1.2.5

Mar 12, 2020

1.2.4

Mar 12, 2020

1.2.3

Mar 11, 2020

1.2.2

Mar 11, 2020

1.2.1

Mar 5, 2020

1.2.0

Mar 5, 2020

1.1.8

Feb 25, 2020

1.1.7

Jan 27, 2020

1.1.6

Jan 26, 2020

1.1.5

Jan 26, 2020

1.1.4

Jan 26, 2020

1.1.3

Jan 23, 2020

1.1.2

Jan 9, 2020

1.1.1

Dec 19, 2019

1.1.0

Nov 14, 2019

1.0.27

Nov 4, 2019

1.0.26

Sep 25, 2019

1.0.25

Sep 25, 2019

1.0.24

Sep 24, 2019

1.0.23

Sep 22, 2019

1.0.22

Sep 21, 2019

1.0.20

Aug 7, 2019

1.0.19

Aug 1, 2019

1.0.18

Aug 1, 2019

1.0.17

Jul 31, 2019

1.0.16

Jul 31, 2019

1.0.15

Jul 31, 2019

1.0.13

Jul 31, 2019

1.0.12

Jul 31, 2019

1.0.11

Jul 31, 2019

1.0.10

Jul 31, 2019

1.0.9

Jul 31, 2019

1.0.7

Jul 31, 2019

1.0.6

Jul 30, 2019

1.0.5

Jul 29, 2019

1.0.4

Jul 29, 2019

1.0.3

Jul 29, 2019

This version

1.0.1

Jul 25, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bohra-1.0.1.tar.gz (35.1 kB view hashes)

Uploaded Jul 25, 2019 Source

Built Distribution

bohra-1.0.1-py3-none-any.whl (47.6 kB view hashes)

Uploaded Jul 25, 2019 Python 3

Hashes for bohra-1.0.1.tar.gz

Hashes for bohra-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`d796a8b703d672a607d51f047f1cd3f8e1e56dbab73a0a780212ba1253bf4438`
MD5	`24300f8b62b0e80a0266c11b5c556670`
BLAKE2b-256	`43ff2b5d874feefd56775a5f779e9d34c4ee9e41db4b005a0a13267e6b4939b0`

Hashes for bohra-1.0.1-py3-none-any.whl

Hashes for bohra-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f07a9c96354e5d425dc87dfb4ab653e804399d11641760fadc3d4575e8ac205e`
MD5	`188e5051473527f524f40c22ec7aad8d`
BLAKE2b-256	`f7324c65b39340e18e5293e850129e20121744158e7d69b1f0b87696bd10bd94`