Skip to main content

Calculation of alignment statistics

Project description

Oxford Nanopore Technologies logo

Mapula

This package provides a command line tool that is able to parse alignments in SAM format and produce a range of useful stats.

Mapula provides several subcommands, use --help with each one to find detailed usage instructions.

Installation

Count mapula can be installed following the usual Python tradition:

pip install mapula

Usage

The main command is mapula count. This command can accept one or several input SAM or BAM files and output various useful statistics.

For available options, see:

mapula count -h

Per-alignment (outputs to stdout)

If you have no need of aggregated stats, you can opt to pipe per-alignment stats to stdout (WIP: currently only in json format). Or for instance if mapula is part of a chain of piping operators, you may relay the SAM records.

Pipe per-alignment stats to stdout in .json format:

mapula count <paths_to_sam_or_bam> -r <path_to_a_reference_fasta> -p json

Pipe SAM records from multiple files to stdout:

mapula count <paths_to_sam_or_bam> <paths_to_sam_or_bam> -p sam

Aggregation (outputs to file)

Aggregated stats can be calculated by enabling the -a flag. Alignments are grouped by user-specified criteria, -s. These aggregations can then be output in two formats using -f. The .csv format is the most easily iterpretable for a quick glance, or for onward programmatic analysis the .json output contains a more in-depth view of the data.

Output some stats in .csv format containing mapping stats:

mapula count <paths_to_sam_or_bam> -r <path_to_a_reference_fasta> -a

Output some stats in .csv format split by read_group and barcode:

mapula count <paths_to_sam_or_bam> -r <path_to_a_reference_fasta> -a -s barcode read_group

Output some stats in .csv and .json format split by SAM and reference fasta:

mapula count \
  <paths_to_sam_or_bam> <paths_to_sam_or_bam> \
  -r <path_to_a_reference_fasta> <path_to_a_reference_fasta> \
  -a \
  -s source fasta

Receive some SAM or BAM from stdin, calculate some aggregated stats and output in .csv, and pipe the SAM records onwards:

minimap2 -y -ax map-ont <path_to_a_reference_fasta> *_reads.fastq \
  | mapula -r <path_to_a_reference_fasta> -a source fasta run_id barcode -p sam \
  | samtools sort -o sorted.aligned.bam

Important: tags

At present, for access to barcode, run_id, read_group, mapula depends on tags being available within the input SAM records, as follows:

  • barcode = bc
  • run_id = rd
  • read_group = rg

If these are not available, Mapula will just provide a placeholder of Unknown. The minimap2 flag -y can carry information from the .fastq header into the records it creates to faciliate this transfer of information.


Help

Licence and Copyright

© 2021- Oxford Nanopore Technologies Ltd.

mapula is distributed under the terms of the Mozilla Public License 2.0.

Research Release

Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mapula-2.0.0.tar.gz (17.7 kB view details)

Uploaded Source

File details

Details for the file mapula-2.0.0.tar.gz.

File metadata

  • Download URL: mapula-2.0.0.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.1.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.6.9

File hashes

Hashes for mapula-2.0.0.tar.gz
Algorithm Hash digest
SHA256 8ec080f91f86f01fdc66e0f213ade8a258256c00d8f4425b0712debb6520134b
MD5 5f27c3fcfa3a50d8dd58eec9c19bf114
BLAKE2b-256 df3e284779f39193bdec4b234eefb9cb8e832ee79bdd7d22e32218188b64b4ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page