Skip to main content

GEXSCOPE Single cell analysis

Project description

CeleScope

CeleScope is a collection of bioinfomatics analysis pipelines to process SCOPE single cell data. Currently it can analyze:

  • Single Cell RNA-Seq data
  • Single Cell Immune Profiling(VDJ) data

Detailed docs can be found in wiki.

Hardware/Software Requirements

  • minimum 32GB RAM(to run STAR aligner)
  • conda
  • git

Installation

  1. Clone repo
git clone https://gitee.com/singleron-rd/celescope.git
# or 
git clone https://github.com/singleron-RD/CeleScope.git
  1. Install conda packages
cd CeleScope
conda create -n celescope
conda activate celescope
conda install --file conda_pkgs.txt --channel conda-forge --channel bioconda --channel r --channel imperial-college-research-computing
  1. Install celescope
pip install celescope
# Use pypi mirror to accelerate downloading if you are in china
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple celescope
  1. Install Beta version(optional)
# If you want to use Beta version of celescope
python setup.py install

Reference genome

Homo sapiens

mkdir -p hs/ensembl_99
cd hs/ensembl_99

wget ftp://ftp.ensembl.org/pub/release-99/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/Homo_sapiens.GRCh38.99.gtf.gz

gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.99.gtf.gz

conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 Homo_sapiens.GRCh38.99.gtf /dev/stdout | \
    awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > Homo_sapiens.GRCh38.99.refFlat

STAR \
    --runMode genomeGenerate \
    --runThreadN 6 \
    --genomeDir ./ \
    --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa \
    --sjdbGTFfile Homo_sapiens.GRCh38.99.gtf \
    --sjdbOverhang 100

Mus musculus

mkdir -p mmu/ensembl_99
cd mmu/ensembl_99

wget ftp://ftp.ensembl.org/pub/release-99/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/mus_musculus/Mus_musculus.GRCm38.99.gtf.gz

gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz 
gunzip Mus_musculus.GRCm38.99.gtf.gz

conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 Mus_musculus.GRCm38.99.gtf /dev/stdout | \
    awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > Mus_musculus.GRCm38.99.refFlat

STAR \
    --runMode genomeGenerate \
    --runThreadN 6 \
    --genomeDir ./ \
    --genomeFastaFiles Mus_musculus.GRCm38.dna.primary_assembly.fa \
    --sjdbGTFfile Mus_musculus.GRCm38.99.gtf \
    --sjdbOverhang 100

Quick start

Single cell RNA-Seq

  1. Prepare mapfile

Mapfile is a tab-delimited text file(.tsv) containing at least three columns. Each line of mapfile represents a pair of fastq files(Read 1 and Read 2).

First column: Fastq file prefix. Fastq files must be gzipped.

Second column: Fastq directory.

Third column: Sample name, which is the prefix of all generated files. One sample can have multiple fastq files.

Fourth column: Optional, force cell number (scRNA-Seq) or match_dir (scVDJ).

Sample mapfile:

$cat ./my.mapfile
R2007197    /SGRNJ/DATA_PROJ/dir1	sample1
R2007199    /SGRNJ/DATA_PROJ/dir2	sample1
R2007198    /SGRNJ/DATA_PROJ/dir1   sample2

$ls /SGRNJ/DATA_PROJ/dir1
R2007198_L2_2.fq.gz
R2007198_L2_1.fq.gz
R2007197_L2_2.fq.gz
R2007197_L2_1.fq.gz

$ls /SGRNJ/DATA_PROJ/dir2
R2007199_L2_2.fq.gz
R2007199_L2_1.fq.gz
  1. Run multi_rna to create shell scripts
conda activate celescope
multi_rna \
 --mapfile ./my.mapfile \
 --genomeDir {some path}/hs/ensembl_99 \
 --thread 8 \
 --mod shell

--mapfile Required, mapfile path.

--genomeDir Required, genomeDir directory.

--thread Maximum number of threads to use, default=4.

--mod Create "sjm"(simple job manager https://github.com/StanfordBioinformatics/SJM) or "shell" scripts.

Shell scripts will be created in ./shell directory, one script per sample. The shell scripts contains all the steps that need to be run.

  1. Run shell scripts under current directory

sh ./shell/{sample}.sh

Single Cell VDJ

Running single Cell VDJ is almost the same as running single Cell RNA-Seq, except that the arguments of multi_vdj are somewhat different.

  1. Prepare mapfile

If you have paired single cell RNA-seq and VDJ samples, the single cell RNA-Seq directory after running CeleScope is called matched_dir. You can write matched_dir's path as the fourth column of mapfile(optional).

R2007197    /SGRNJ/DATA_PROJ/dir    sample1 /SGRNJ/Projects/sample1
  1. Run multi_vdj to create shell scripts
conda activate celescope
multi_vdj \
 --mapfile ./my.mapfile \
 --type TCR \
 --thread 8 \
 --mod shell \

--type Required. TCR or BCR.

  1. Run shell scripts under current directory

Project details


Release history Release notifications | RSS feed

This version

1.1.9

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celescope-1.1.9.tar.gz (2.5 MB view hashes)

Uploaded Source

Built Distributions

celescope-1.1.9-py3.6.egg (2.8 MB view hashes)

Uploaded Source

celescope-1.1.9-py3-none-any.whl (2.6 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page