Skip to main content

A variant calling pipeline to analyse sequencing Illumina data

Project description

https://badge.fury.io/py/sequana-variant-calling.svg JOSS (journal of open source software) DOI https://github.com/sequana/variant_calling/actions/workflows/main.yml/badge.svg

This is is the variant_calling pipeline from the Sequana projet

Overview:

Variant calling from FASTQ files

Input:

FASTQ files from Illumina Sequencing instrument

Output:

VCF and HTML files

Status:

production

Citation:

Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352

Installation

You must install Sequana first (use –upgrade to get the latest version installed):

pip install sequana --upgrade

Then, just install this package:

pip install sequana_variant_calling --upgrade

Usage

sequana_variant_calling --help
sequana_variant_calling --input-directory DATAPATH --reference-file measles.fa
sequana_variant_calling --input-directory DATAPATH --reference-file measles.fa

This creates a directory variant_calling. You just need to execute the pipeline:

cd variant_calling
sh variant_calling.sh

This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters:

snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt

Or use sequanix interface.

Requirements

This pipelines requires the following executable(s):

  • bwa

  • freebayes

  • picard (picard-tools)

  • sambamba

  • samtools

  • snpEff

https://raw.githubusercontent.com/sequana/sequana_variant_calling/master/sequana_pipelines/variant_calling/dag.png

Details

Snakemake variant calling pipeline is based on tutorial written by Erik Garrison. Input reads (paired or single) are mapped using bwa and sorted with sambamba-sort. PCR duplicates are marked with sambamba-markdup. Freebayes is used to detect SNPs and short INDELs. The INDEL realignment and base quality recalibration are not necessary with Freebayes. For more information, please refer to a post by Brad Chapman on minimal BAM preprocessing methods.

The pipeline provides an analysis of the mapping coverage using sequana coverage. It detects and characterises automatically low and high genome coverage regions.

Detected variants are annotated with SnpEff if a GenBank file is provided. The pipeline does the database building automatically. Although most of the species should be handled automatically, some special cases such as particular codon table will required edition of the snpeff configuration file.

Finally, joint calling is also available and can be switch on if desired.

Changelog

Version

Description

0.10.0

  • fully integrated sequana wrappers and simplification of HTML reports

0.9.10

  • Uses new sequana_pipetools and wrappers

0.9.5

  • fix typo in the onsuccess and update sequana requirements to use most up-to-date snakemake rules

0.9.4

  • fix typo related to the reference-file option new name not changed everyhere in the pipeline.

0.9.3

  • use new framework (faster –help, –from-project option)

  • rename –reference into –reference-file and –annotation to –annotation-file

  • add custom summary page

  • add multiqc config file

0.9.2

  • snpeff output files are renamed sample.snpeff (instead of samplesnpeff)

  • add multiqc to show sequana_coverage and snpeff summary sections

  • cleanup onsuccess section

  • more options sanity checks and options (e.g.,

  • genbank_file renamed into annotation_file in the config

  • use –legacy in freebayes options

  • fix coverage section to use new sequana api

  • add the -do-coverage, –do-joint-calling options as well as –circular and –frebayes–ploidy

0.9.1

  • Fix input-readtag, which was not populated

0.9.0

First release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_variant_calling-0.10.0.tar.gz (340.0 kB view details)

Uploaded Source

File details

Details for the file sequana_variant_calling-0.10.0.tar.gz.

File metadata

  • Download URL: sequana_variant_calling-0.10.0.tar.gz
  • Upload date:
  • Size: 340.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.9

File hashes

Hashes for sequana_variant_calling-0.10.0.tar.gz
Algorithm Hash digest
SHA256 5824f9a3d8685c9527affaf1a8f0bcf538d2a19b7238d972fcec573dea939fe1
MD5 a37c328f754278033883d60182ba3aeb
BLAKE2b-256 969716c238324bcb3028380622a46c639237e17b56083f5a174c15fe83565b84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page