GEXSCOPE Single cell analysis

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

CeleScope

CeleScope is a collection of bioinfomatics analysis pipelines to process SCOPE single cell data. Currently it can analyze:

Single Cell RNA-Seq data
Single Cell Immune Profiling(VDJ) data

Detailed docs can be found in wiki.

Hardware/Software Requirements

minimum 32GB RAM(to run STAR aligner)
conda
git

Installation

Clone repo

git clone https://gitee.com/singleron-rd/celescope.git
# or 
git clone https://github.com/singleron-RD/CeleScope.git

Install conda packages

cd CeleScope
conda create -n celescope
conda activate celescope
conda install --file conda_pkgs.txt --channel conda-forge --channel bioconda --channel r --channel imperial-college-research-computing

Install celescope

pip install celescope
# Use pypi mirror to accelerate downloading if you are in china
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple celescope

Install Beta version(optional)

# If you want to use Beta version of celescope
python setup.py install

Reference genome

Homo sapiens

mkdir -p hs/ensembl_99
cd hs/ensembl_99

wget ftp://ftp.ensembl.org/pub/release-99/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/Homo_sapiens.GRCh38.99.gtf.gz

gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.99.gtf.gz

conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 Homo_sapiens.GRCh38.99.gtf /dev/stdout | \
    awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > Homo_sapiens.GRCh38.99.refFlat

STAR \
    --runMode genomeGenerate \
    --runThreadN 6 \
    --genomeDir ./ \
    --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa \
    --sjdbGTFfile Homo_sapiens.GRCh38.99.gtf \
    --sjdbOverhang 100

Mus musculus

mkdir -p mmu/ensembl_99
cd mmu/ensembl_99

wget ftp://ftp.ensembl.org/pub/release-99/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget ftp://ftp.ensembl.org/pub/release-99/gtf/mus_musculus/Mus_musculus.GRCm38.99.gtf.gz

gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz 
gunzip Mus_musculus.GRCm38.99.gtf.gz

conda activate celescope
gtfToGenePred -genePredExt -geneNameAsName2 Mus_musculus.GRCm38.99.gtf /dev/stdout | \
    awk '{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' > Mus_musculus.GRCm38.99.refFlat

STAR \
    --runMode genomeGenerate \
    --runThreadN 6 \
    --genomeDir ./ \
    --genomeFastaFiles Mus_musculus.GRCm38.dna.primary_assembly.fa \
    --sjdbGTFfile Mus_musculus.GRCm38.99.gtf \
    --sjdbOverhang 100

Quick start

Single cell RNA-Seq

Prepare mapfile

Mapfile is a tab-delimited text file(.tsv) containing at least three columns. Each line of mapfile represents a pair of fastq files(Read 1 and Read 2).

First column: Fastq file prefix. Fastq files must be gzipped.

Second column: Fastq directory.

Third column: Sample name, which is the prefix of all generated files. One sample can have multiple fastq files.

Fourth column: Optional, force cell number (scRNA-Seq) or match_dir (scVDJ).

Sample mapfile:

$cat ./my.mapfile
R2007197    /SGRNJ/DATA_PROJ/dir1	sample1
R2007199    /SGRNJ/DATA_PROJ/dir2	sample1
R2007198    /SGRNJ/DATA_PROJ/dir1   sample2

$ls /SGRNJ/DATA_PROJ/dir1
R2007198_L2_2.fq.gz
R2007198_L2_1.fq.gz
R2007197_L2_2.fq.gz
R2007197_L2_1.fq.gz

$ls /SGRNJ/DATA_PROJ/dir2
R2007199_L2_2.fq.gz
R2007199_L2_1.fq.gz

Run multi_rna to create shell scripts

conda activate celescope
multi_rna \
 --mapfile ./my.mapfile \
 --genomeDir {some path}/hs/ensembl_99 \
 --thread 8 \
 --mod shell

--mapfile Required, mapfile path.

--genomeDir Required, genomeDir directory.

--thread Maximum number of threads to use, default=4.

--mod Create "sjm"(simple job manager https://github.com/StanfordBioinformatics/SJM) or "shell" scripts.

Shell scripts will be created in ./shell directory, one script per sample. The shell scripts contains all the steps that need to be run.

Run shell scripts under current directory

sh ./shell/{sample}.sh

Single Cell VDJ

Running single Cell VDJ is almost the same as running single Cell RNA-Seq, except that the arguments of multi_vdj are somewhat different.

Prepare mapfile

If you have paired single cell RNA-seq and VDJ samples, the single cell RNA-Seq directory after running CeleScope is called matched_dir. You can write matched_dir's path as the fourth column of mapfile(optional).

R2007197    /SGRNJ/DATA_PROJ/dir    sample1 /SGRNJ/Projects/sample1

Run multi_vdj to create shell scripts

conda activate celescope
multi_vdj \
 --mapfile ./my.mapfile \
 --type TCR \
 --thread 8 \
 --mod shell \

--type Required. TCR or BCR.

Run shell scripts under current directory

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.0.8 yanked

Mar 6, 2024

Reason this release was yanked:

data not uploaded

2.0.7

Dec 1, 2023

2.0.6

Nov 16, 2023

2.0.5

Nov 15, 2023

2.0.4

Nov 7, 2023

2.0.3

Oct 31, 2023

2.0.2

Oct 30, 2023

2.0.1

Oct 27, 2023

2.0.0

Oct 26, 2023

2.0.0b0 pre-release

Oct 7, 2023

1.17.0

Aug 11, 2023

1.16.2

Jul 14, 2023

1.16.1 yanked

Jul 7, 2023

Reason this release was yanked:

when using multithreading, the same (barcode, gene) combination in the matrix may have multiple entries.

1.16.0

Jul 5, 2023

1.16.0b0 pre-release

Jun 25, 2023

1.15.2

May 24, 2023

1.15.1

May 22, 2023

1.15.0

Apr 12, 2023

1.14.1

Jan 11, 2023

1.14.0

Dec 21, 2022

1.14.0b0 pre-release

Dec 2, 2022

1.13.0

Oct 28, 2022

1.12.1

Sep 16, 2022

1.12.0

Sep 14, 2022

1.12.0b0 pre-release

Sep 8, 2022

1.11.1

Aug 10, 2022

1.11.0

Jul 7, 2022

1.11.0b1 pre-release

Jul 7, 2022

1.11.0b0 pre-release

Jun 21, 2022

1.10.0

Apr 22, 2022

1.9.0

Apr 1, 2022

1.8.1

Mar 23, 2022

1.8.0

Mar 17, 2022

1.7.2

Feb 11, 2022

1.7.1

Jan 17, 2022

1.7.0 yanked

Dec 28, 2021

Reason this release was yanked:

bug with `mt_gene_list`

1.6.1

Dec 1, 2021

1.6.0 yanked

Nov 30, 2021

Reason this release was yanked:

bug in featureCounts

1.5.2

Nov 4, 2021

1.5.1

Oct 28, 2021

1.5.1b0 pre-release

Oct 12, 2022

1.5.0

Sep 9, 2021

1.4.0

Aug 24, 2021

1.3.2

Jul 9, 2021

1.3.1

Jun 10, 2021

1.3.0

May 28, 2021

1.2.0

May 19, 2021

This version

1.1.9

May 8, 2021

1.1.9b1 pre-release

May 8, 2021

1.1.9b0 pre-release

May 8, 2021

1.1.8

Mar 26, 2021

1.1.8b0 pre-release

Mar 18, 2021

1.1.7

Dec 16, 2020

1.1.7b0 pre-release

Dec 11, 2020

1.1.6

Dec 3, 2020

1.1.6b0 pre-release

Dec 3, 2020

1.1.4

Sep 10, 2020

1.1.3

Aug 28, 2020

1.1.2

Aug 28, 2020

1.1.1

Aug 17, 2020

1.1.0 yanked

Aug 17, 2020

Reason this release was yanked:

unstable

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celescope-1.1.9.tar.gz (2.5 MB view hashes)

Uploaded May 8, 2021 Source

Built Distributions

celescope-1.1.9-py3.6.egg (2.8 MB view hashes)

Uploaded May 8, 2021 Source

celescope-1.1.9-py3-none-any.whl (2.6 MB view hashes)

Uploaded May 8, 2021 Python 3

Hashes for celescope-1.1.9.tar.gz

Hashes for celescope-1.1.9.tar.gz
Algorithm	Hash digest
SHA256	`3af0eec41e4d72c918444c7c266856a60cfa4ea0f8c410ab05b9300dd3e9e465`
MD5	`e460b21e083d8d7f86315a70a21f55f8`
BLAKE2b-256	`f0e42abb3551ec4d6fc475227bb4333a37b0831a6f50e428692a6be816235627`

Hashes for celescope-1.1.9-py3.6.egg

Hashes for celescope-1.1.9-py3.6.egg
Algorithm	Hash digest
SHA256	`076b1759c321e8a45de4533f67814d8bc64d7ffc7edd79e1da6ac7d602b49e86`
MD5	`64e21d34c9345dbb5ff78f050d5dfe38`
BLAKE2b-256	`a5d64b39317c812e5e726ceeaa7c4c1bc914a044d2c959f3e6b6b5634150a6eb`

Hashes for celescope-1.1.9-py3-none-any.whl

Hashes for celescope-1.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ea905dd802401b43a90fa5989185414e7d06fc1240557e671a3f270b80df6031`
MD5	`f41ffce8c271c73a16b1764c50d8a2ff`
BLAKE2b-256	`b370b8cb10e6b72b0c20f9f2d3df4f34216f7737496cfba47c7a39f5a9a38873`