Skip to main content

A Python wrapper for minimap2-rs

Project description

Python bindings for the Rust FFI minimap2 library. In development! Feedback appreciated!

Why?

PyO3 makes it very easy to create Python libraries via Rust. Further, we can use Polars to export results as a dataframe (which can be used as-is, or converted to Pandas). Python allows for faster experimentation with novel algorithms, integration into machine learning pipelines, and provides an opportunity for those not familiar with Rust nor C/C++ to use minimap2.

Current State

Very early alpha. Please use, and open an issue for any features you need that are missing, and for any bugs you find.

How to use

Requirements

Polars and PyArrow, these should be installed when you install minimappers2

Creating an Aligner Instance

aligner = map_ont()
aligner.threads(4)

If you want an alignment performed, rather than just matches, enable .cigar()

aligner = map_hifi()
aligner.cigar()

Please note, at this time the following syntax is NOT supported:

aligner = map_ont().threads(4).cigar()

Creating an index

aligner.index("ref.fa")

To save a built-index, for future processing use:

aligner.index_and_save("ref.fa", "ref.mmi")

Then next time you use the index will be faster if you use the saved index instead.

aligner.load_index("ref.mmi")

Aligning a Single Sequence

query = Sequence(seq_name, seq)
aligner.map1(query)

# Example
seq = "CCAGAACGTACAAGGAAATATCCTCAAATTATCCCAAGAATTGTCCGCAGGAAATGGGGATAATTTCAGAAATGAGAG"
result = aligner.map1(Sequence("MySeq", seq))

Where seq_name and seq are both strings. The output is a Polars DataFrame.

Aligning Multiple Sequences

TBD

Mapping a file

Please open an issue if you need to map files from this API.

Results

All results are returned as Polars dataframes. You can convert Polars dataframes to Pandas dataframes with .to_pandas()

  • Polars is the fastest dataframe library in the Python Ecosystem.
  • Polars provides a nice data bridge between Rust and Python.

For more information, please see the Polars User Guide or the Polars Guide for Pandas users.

Example of Results

Here is an image of the resulting dataframe Resulting Dataframe Image

NOTE Mapq, Cigar, and others will not show up unless .cigar() is enabled on the aligner itself.

Errors

As this is a very-early stage library, error checking is not yet implemented. When things crash you will likely need to restart your python interpreter (jupyter kernel). Let me know what happened and open an issue and I will get to it.

Performance

Effort has been made to make this as performant as possible, but if you need more performance, please use minimap2 directly and import the results.

Citation

You should cite the minimap2 papers if you use this in your work.

Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. [doi:10.1093/bioinformatics/bty191][doi]

and/or:

Li, H. (2021). New strategies to improve minimap2 alignment accuracy. Bioinformatics, 37:4572-4574. [doi:10.1093/bioinformatics/btab705][doi2]

Changelog

0.1.0

  • Initial Functions implemented
  • Return results as Polars dfs

Funding

Genomics Aotearoa

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minimappers2-0.1.0.tar.gz (156.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minimappers2-0.1.0-cp37-abi3-manylinux_2_34_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.7+manylinux: glibc 2.34+ x86-64

File details

Details for the file minimappers2-0.1.0.tar.gz.

File metadata

  • Download URL: minimappers2-0.1.0.tar.gz
  • Upload date:
  • Size: 156.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/0.14.10

File hashes

Hashes for minimappers2-0.1.0.tar.gz
Algorithm Hash digest
SHA256 16fadf0bc3f6976af19160f212ee7df5175ed6366c3ea73483b25b36e115933c
MD5 ffe111866f1055533075d488a0dea9ca
BLAKE2b-256 3aca094f7002d76480af7f36541b265d2c3044570bfb9a7e43b836c96ab47c82

See more details on using hashes here.

File details

Details for the file minimappers2-0.1.0-cp37-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for minimappers2-0.1.0-cp37-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 cb428cd0e72b5a83dd4d4f8dcc492a51ceb7694d023e20f1e954b0d790c45c5e
MD5 0933ba933f6cba24d485f030de4b7ee3
BLAKE2b-256 073e89d0512290860eb54fac382884fb0d95a48f92b71a1fe904ca7e15233f43

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page