Skip to main content

Extract Methylation calls from ONT or PB long read data

Project description

LoReMe pipeline

LoReMe (Long Read Methylaton) is a Python package facilitating analysis of DNA methylation signals from Pacific Biosciences or Oxford Nanopore long read sequencing data.

It consists of an API and CLI for three distinct applications:

  1. Pacific Biosciences data processing. PB reads in SAM/BAM format are aligned to a reference genome with the special-purpose aligner pbmm2, a modified version of minimap2. Methylation calls are then piled up from the aligned reads with pb-CpG-tools.

  2. Oxford nanopore basecalling. ONT reads are optionally converted from FAST5 to POD5 format, then basecalled and aligned to a reference with dorado (dorado alignment also uses minimap2 under the hood), and finally piled up with modkit.

  3. Postprocessing and QC of methylation calls. Several functions are available to generate diagnostic statistics and plots.

See also the full documentation.

Other tools of interest: methylartist and modbamtools (modbamtools docs), methplotlib

Installation

In a Conda environment

The recommended way to install loreme is with a dedicated conda environment:

First create an environment including all dependencies:

conda create -n loreme -c conda-forge -c bioconda samtools pbmm2 \
  urllib3 pybedtools gff2bed seaborn pyfaidx psutil gputil tabulate \
  cython h5py iso8601 more-itertools polars tqdm
conda activate loreme

Then install with pip:

pip install loreme

You may also wish to install nvtop to monitor GPU usage:

conda install -c conda-forge nvtop

With pip

pip install loreme

Check installation

Check that the correct version was installed with loreme --version

Uninstall

To uninstall loreme:

loreme clean
pip uninstall loreme

Oxford Nanopore reads

Download dorado

Calling methylation from ONT long reads requires the basecaller dorado . Download it by running

loreme download-dorado <platform>

This will download dorado and several basecalling models. The platform should be one of: linux-x64, linux-arm64, osx-arm64, win64, whichever matches your system. Running loreme download-dorado --help will show a hint as to the correct choice.

Note

For members of Michael Lab at Salk running on seabiscuit, use loreme download-dorado linux-x64.

Modified basecalling

You can carry out modified basecalling (i.e. DNA methylation) with default parameters by running:

loreme dorado-basecall <input.pod5> <output.sam>

For other parameter options, see loreme dorado --help

Note

Basecalling ONT data is disk-read intensive, so for best performance the input POD5 data should be on a fast SSD (For example, /scratch/<username> for members of Michael Lab at Salk).

To run dorado with only regular basecalling, use the --no-mod option:

loreme dorado-basecall --no-mod <input.pod5> <output.sam>

If you wish to convert the SAM file to a FASTQ file, use:

samtools view -bo output.bam output.sam
samtools fastq -T '*' output.bam > output.fq

Alignment

The SAM file produced by dorado can be aligned to a reference index (FASTA or MMI file) with loreme dorado-align:

loreme dorado-align <index> <reads> <output.bam>

Download modkit

Piling up methylation calls from BAM data requires modkit . Download it by running:

loreme download-modkit

Pileup

The pileup step generates a bedMethyl file from an aligned BAM file.

loreme modkit-pileup <reference.fasta> <input.bam> <output.bed>

Note

See loreme modkit-pileup --help for additional options. On a HPC system you may want to use additional threads with the -t flag.

Postprocessing

See the Pacific Biosciences reads section for examples of postprocessing analysis that can be applied to bedMethyl files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loreme-0.1.3.tar.gz (23.8 kB view hashes)

Uploaded Source

Built Distribution

loreme-0.1.3-py3-none-any.whl (32.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page