Skip to main content

Python tools for proteogenomics

Project description

Python tools for ProteoGenomics Analysis Toolkit

Python application install with bioconda Codacy Badge PyPI version PyPI - Downloads

pypgatk is a Python library part of the ProteoGenomics Analysis Toolkit. It provides different bioinformatics tools for proteogenomics data analysis.

Requirements:

This package requirements vary depending on the way that you want to install it (all three are independent, you don't need all these requirements):

  • pip: if installation goes through pip, you will require Python3 and pip3 installed.
  • Bioconda: if installation goes through Bioconda, you will require that conda is installed and configured to use bioconda channels.
  • Docker container: to use pypgatk from its docker container you will need Docker installed.
  • Source code: to use and install from the source code directly, you will need to have git, Python3 and pip.

Installation

pip

You can install pypgatk with pip:

pip install pypgatk

Bioconda

You can install pypgatk with bioconda (please setup conda and the bioconda channel if you haven't first, as explained here):

conda install pypgatk

Available as a container

You can use the pypgatk tool already setup on a Docker container. You need to choose from the available tags here and replace it in the call below where it says <tag>.

docker pull quay.io/biocontainers/pypgatk:<tag>

NOTE: Please note that Biocontainers containers do not have a latest tag, as such a docker pull/run without defining the tag will fail. For instance, a valid call would be (for version 0.0.2):

docker run -it quay.io/biocontainers/pypgatk:0.0.2--py_0

Inside the container, you can either use the Python interactive shell or the command line version (see below).

Use latest source code

Alternatively, for the latest version, clone this repo and go into its directory, then execute pip3 install . :

git clone https://github.com/bigbio/py-pgatk
cd py-pgatk
# you might want to create a virtualenv for pypgatk before installing
pip3 install .

Usage

The pypgatk design combines multiple modules and tools into one framework. All the possible commands are accessible using the commandline tool pypgatk_cli.py.

$: pypgatk_cli.py -h
Usage: pypgatk [OPTIONS] COMMAND [ARGS]...

  This is the main tool that give access to all commands and options
  provided by the pypgatk

Options:
  -h, --help  Show this message and exit.

Commands:
  cbioportal-downloader     Command to download the the cbioportal studies
  cbioportal-to-proteindb  Command to translate cbioportal mutation data into
                           proteindb
  cosmic-downloader        Command to download the cosmic mutation database
  cosmic-to-proteindb      Command to translate Cosmic mutation data into
                           proteindb
  dnaseq-to-proteindb      Generate peptides based on DNA sequences
  ensembl-downloader       Command to download the ensembl information
  generate-decoy           Create decoy protein sequences. Each protein is
                           reversed and the cleavage sites switched with
                           preceding amino acid. Peptides are checked for
                           existence in target sequences if foundthe tool will
                           attempt to shuffle them. James.Wright@sanger.ac.uk
                           2015
  threeframe-translation   Command to perform 3frame translation
  vcf-to-proteindb         Generate peptides based on DNA variants from
                           ENSEMBL VEP VCF files

The library provides multiple commands to download, translate and generate protein sequence databases from reference and mutation genome databases.

Full Documentation

https://pgatk.readthedocs.io/en/latest/pypgatk.html

Cite as

Husen M Umer, Enrique Audain, Yafeng Zhu, Julianus Pfeuffer, Timo Sachsenberg, Janne Lehtiö, Rui M Branca, Yasset Perez-Riverol, Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides, Bioinformatics, Volume 38, Issue 5, 1 March 2022, Pages 1470–1472, https://doi.org/10.1093/bioinformatics/btab838

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypgatk-0.0.22.tar.gz (168.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pypgatk-0.0.22-py3-none-any.whl (187.3 kB view details)

Uploaded Python 3

File details

Details for the file pypgatk-0.0.22.tar.gz.

File metadata

  • Download URL: pypgatk-0.0.22.tar.gz
  • Upload date:
  • Size: 168.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pypgatk-0.0.22.tar.gz
Algorithm Hash digest
SHA256 d7fa36b4cf806af81a15b2e68837771fb246f9d41d4910282a9f516496ea3017
MD5 bd490d1e1708ac392b6e1f434f0773f7
BLAKE2b-256 49dc12de32bfa5422ef3d0674833223957a8c853aa9c9802f30acb7c17ac2e48

See more details on using hashes here.

File details

Details for the file pypgatk-0.0.22-py3-none-any.whl.

File metadata

  • Download URL: pypgatk-0.0.22-py3-none-any.whl
  • Upload date:
  • Size: 187.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pypgatk-0.0.22-py3-none-any.whl
Algorithm Hash digest
SHA256 fed3871dba3ef8254382497ee9a28cab5d0a44b40c8b160c92b2dfad3e7c63f2
MD5 74956a8a3aa328c5e4227fe0b21c02b8
BLAKE2b-256 6e5bcb038823cd58b5adb817fb82bf0078c5e38c13a0e73ec6bd67e81f142d74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page