Skip to main content

cellSNP - Analysis of expressed alleles in single cells

Project description

cellSNP aims to pileup the expressed alleles in single-cell or bulk RNA-seq data, which can be directly used for donor deconvolution in multiplexed single- cell RNA-seq data, particularly with cardelino, an R package, which assigns cells to donors and detects doublets, even without genotyping the given donors.

cellSNP heavily depends on pysam, a Python interface for samtools and bcftools. This program should give very similar results as samtools/bcftools mpileup, if it isn’t the same. Also, there are two major differences comparing to bcftools mpileup:

  1. cellSNP can pileup either the whole genome or a list of positions, with directly splitting into a list of cell barcodes, e.g., for 10x genome. With bcftools, you may need to manipulate the RG tag in the bam file first.

  2. cellSNP uses simple filtering for outputting SNPs, i.e., total UMIs or counts and minor alleles fractions. The idea here is to keep most information of SNPs and the downstream statistical model can handle adaptively.

Installation

cellSNP is available through pypi. To install, type the following command line, and add -U for upgrading:

pip install cellSNP

Alternatively, you can download or clone this repository and type python setup.py install to install. In either case, add --user if you don’t have the permission as a root or for your Python environment.

Quick usage

Once installed, check all arguments by type cellSNP -h. There are three modes of cellSNP:

Mode 1: pileup a list of common SNPs for single cells in a big BAM/SAM file. Require: a single BAM/SAM file, e.g., from cellranger, a VCF file for a list of common SNPs. This mode is recommended comparing to mode 2, if a list of common SNP is known, e.g., human.

cellSNP -s $BAM -b $BARCODE -o $OUT_FILE -R $REGION_VCF -p 20

Recommend filtering SNPs with <20UMIs or <10% minor alleles for downstream donor deconvolution, by adding --minMAF 0.1 --minCOUNT 20

Mode 2: pileup the whole genome for single cells in a big BAM/SAM file. This mode may give uninformative SNPs, but can be useful when the data set is highly sparse.

cellSNP -s $BAM -b $BARCODE -o $OUT_FILE -p 22

Recommend filtering SNPs with <100UMIs or <10% minor alleles for saving space and speed up inference when pileup whole genome: --minMAF 0.1 --minCOUNT 100

Mode 3: pileup a list of common SNPs for one or multiple bulk BAM/SAM files. Require: one or multiple BAM/SAM files, their according sample ids, and a VCF file for a list of common SNPs.

cellSNP -s $BAM1,$BAM2,$BAM3 -I sample_id1,sample_id2,sample_id3 -o $OUT_FILE -R $REGION_VCF -p 20

Set filtering thresholds according to the downstream analysis.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellSNP-0.0.6.tar.gz (9.9 kB view details)

Uploaded Source

File details

Details for the file cellSNP-0.0.6.tar.gz.

File metadata

  • Download URL: cellSNP-0.0.6.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.6

File hashes

Hashes for cellSNP-0.0.6.tar.gz
Algorithm Hash digest
SHA256 66e01ede8b68f69bcc6293e05e8345c1e315feda55c3f03399fbf78e9febfff9
MD5 8c2af55abbb1a81aa02568de6423c447
BLAKE2b-256 1d3844d745500aaaa2b67de42f3e3baee579d3380a0d98d8895d0ba7302dd227

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page