Skip to main content

Cython bindings and Python interface to HMMER3.

Project description

🐍🟡♦️🟦 PyHMMER Stars

Cython bindings and Python interface to HMMER3.

Actions Coverage PyPI Bioconda AUR Wheel Python Versions Python Implementations License Source Mirror GitHub issues Docs Changelog Downloads DOI

🗺️ Overview

HMMER is a biological sequence analysis tool that uses profile hidden Markov models to search for sequence homologs. HMMER3 is developed and maintained by the Eddy/Rivas Laboratory at Harvard University.

pyhmmer is a Python package, implemented using the Cython language, that provides bindings to HMMER3. It directly interacts with the HMMER internals, which has the following advantages over CLI wrappers (like hmmer-py):

  • single dependency: If your software or your analysis pipeline is distributed as a Python package, you can add pyhmmer as a dependency to your project, and stop worrying about the HMMER binaries being properly setup on the end-user machine.
  • no intermediate files: Everything happens in memory, in Python objects you have control on, making it easier to pass your inputs to HMMER without needing to write them to a temporary file. Output retrieval is also done in memory, via instances of the pyhmmer.plan7.TopHits class.
  • no input formatting: The Easel object model is exposed in the pyhmmer.easel module, and you have the possibility to build a DigitalSequence object yourself to pass to the HMMER pipeline. This is useful if your sequences are already loaded in memory, for instance because you obtained them from another Python library (such as Pyrodigal or Biopython).
  • no output formatting: HMMER3 is notorious for its numerous output files and its fixed-width tabular output, which is hard to parse (even Bio.SearchIO.HmmerIO is struggling on some sequences).
  • efficient: Using pyhmmer to launch hmmsearch on sequences and HMMs in disk storage is typically as fast as directly using the hmmsearch binary (see the Benchmarks section). pyhmmer.hmmer.hmmsearch uses a different parallelisation strategy compared to the hmmsearch binary from HMMER, which can help getting the most of multiple CPUs when annotating smaller sequence databases.

This library is still a work-in-progress, and in an experimental stage, but it should already pack enough features to run biological analyses or workflows involving hmmsearch, hmmscan, nhmmer, phmmer, hmmbuild and hmmalign.

🔧 Installing

pyhmmer can be installed from PyPI, which hosts some pre-built CPython wheels for x86-64 Linux, as well as the code required to compile from source with Cython:

$ pip install pyhmmer

Compilation for UNIX PowerPC is not tested in CI, but should work out of the box. Other architectures (e.g. Arm) and OSes (e.g. Windows) are not supported by HMMER.

A Bioconda package is also available:

$ conda install -c bioconda pyhmmer

📖 Documentation

A complete API reference can be found in the online documentation, or directly from the command line using pydoc:

$ pydoc pyhmmer.easel
$ pydoc pyhmmer.plan7

💡 Example

Use pyhmmer to run hmmsearch, and obtain an iterable over TopHits that can be used for further sorting/querying in Python. Processing happens in parallel using Python threads, and a TopHits object is yielded for every HMM passed in the input iterable.

import pyhmmer

with pyhmmer.easel.SequenceFile("pyhmmer/tests/data/seqs/938293.PRJEB85.HG003687.faa", digital=True) as seq_file:
    sequences = list(seq_file)

with pyhmmer.plan7.HMMFile("pyhmmer/tests/data/hmms/txt/t2pks.hmm") as hmm_file:
    for hits in pyhmmer.hmmsearch(hmm_file, sequences, cpus=4):
      print(f"HMM {hits.query_name.decode()} found {len(hits)} hits in the target sequences")

Have a look at more in-depth examples such as building a HMM from an alignment, analysing the active site of a hit, or fetching marker genes from a genome in the Examples page of the online documentation.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⏱️ Benchmarks

Benchmarks were run on a i7-10710U CPU running @1.10GHz with 6 physical / 12 logical cores, using a FASTA file containing 4,489 protein sequences extracted from the genome of Escherichia coli (562.PRJEB4685) and the version 33.1 of the Pfam HMM library containing 18,259 domains. Commands were run 3 times on a warm SSD. Plain lines show the times for pressed HMMs, and dashed-lines the times for HMMs in text format.

Benchmarks

Raw numbers can be found in the benches folder. They suggest that phmmer should be run with the number of logical cores, while hmmsearch should be run with the number of physical cores (or less). A possible explanation for this observation would be that HMMER platform-specific code requires too many SIMD registers per thread to benefit from simultaneous multi-threading.

To read more about how PyHMMER achieves better parallelism than HMMER for many-to-many searches, have a look at the Performance page of the documentation.

🔍 See Also

Building a HMM from scratch? Then you may be interested in the pyfamsa package, providing bindings to FAMSA, a very fast multiple sequence aligner. In addition, you may want to trim alignments: in that case, consider pytrimal, which wraps trimAl 2.0.

If despite of all the advantages listed earlier, you would rather use HMMER through its CLI, this package will not be of great help. You can instead check the hmmer-py package developed by Danilo Horta at the EMBL-EBI.

⚖️ License

This library is provided under the MIT License. The HMMER3 and Easel code is available under the BSD 3-clause license. See vendor/hmmer/LICENSE and vendor/easel/LICENSE for more information.

This project is in no way affiliated, sponsored, or otherwise endorsed by the original HMMER authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

Project details


Release history Release notifications | RSS feed

This version

0.7.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhmmer-0.7.0.tar.gz (11.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyhmmer-0.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (16.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.24+ x86-64

pyhmmer-0.7.0-cp311-cp311-macosx_10_9_universal2.whl (11.1 MB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

pyhmmer-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (16.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64manylinux: glibc 2.24+ x86-64

pyhmmer-0.7.0-cp310-cp310-macosx_10_15_x86_64.whl (11.1 MB view details)

Uploaded CPython 3.10macOS 10.15+ x86-64

pyhmmer-0.7.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64manylinux: glibc 2.24+ x86-64

pyhmmer-0.7.0-cp39-cp39-macosx_10_15_x86_64.whl (11.2 MB view details)

Uploaded CPython 3.9macOS 10.15+ x86-64

pyhmmer-0.7.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (16.7 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64manylinux: glibc 2.24+ x86-64

pyhmmer-0.7.0-cp38-cp38-macosx_10_15_x86_64.whl (11.1 MB view details)

Uploaded CPython 3.8macOS 10.15+ x86-64

pyhmmer-0.7.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (16.1 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64manylinux: glibc 2.24+ x86-64

pyhmmer-0.7.0-cp37-cp37m-macosx_10_15_x86_64.whl (11.1 MB view details)

Uploaded CPython 3.7mmacOS 10.15+ x86-64

pyhmmer-0.7.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (16.2 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64manylinux: glibc 2.24+ x86-64

File details

Details for the file pyhmmer-0.7.0.tar.gz.

File metadata

  • Download URL: pyhmmer-0.7.0.tar.gz
  • Upload date:
  • Size: 11.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.15

File hashes

Hashes for pyhmmer-0.7.0.tar.gz
Algorithm Hash digest
SHA256 5fa04e8f45d706b5d05f36f75ad3fce432bf2a5522f2ce64ec858e55a72a4b6c
MD5 908a59d572eebb64ea967cb51d93f7ff
BLAKE2b-256 b4aa5f50a6780689a7372c851c8474fd5487b65d3df53f2a434bc5a99bc01cee

See more details on using hashes here.

File details

Details for the file pyhmmer-0.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for pyhmmer-0.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 28941108a34b5b7d362c632a44d8cc12b6903f5aa489c475b780549636707ca0
MD5 ea0c687fadc8ea9d9fa6d32d4cc09750
BLAKE2b-256 7d64775e2adae614da4f3481e035153058f95338913b433d82a2b295e672c753

See more details on using hashes here.

File details

Details for the file pyhmmer-0.7.0-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for pyhmmer-0.7.0-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 582b53538e032c944ab9aee5b988ccabec34b5563199a4a10960f515d5208fd5
MD5 8cdd1c2060d13bc43200db723d8c36f9
BLAKE2b-256 36e93d4b55e50c0815ec25f18e367982e588bc84033e6b8d5417671210214dc5

See more details on using hashes here.

File details

Details for the file pyhmmer-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for pyhmmer-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 d9fbf35bff88ed6cdedbbd0a2b589549ceb1b7777cd2b82e6f9c4783409a0510
MD5 b7cd4bc86369b7176c5b36475035fdcc
BLAKE2b-256 3c74ee460564fd2c0d9404e75649c4a5e476d962c37b018de89592f7960d1325

See more details on using hashes here.

File details

Details for the file pyhmmer-0.7.0-cp310-cp310-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for pyhmmer-0.7.0-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 5bb500ddcde2094f55f54fa8cba92bb9dc0ab20f9df72b9ada2870117dceb5d8
MD5 0e30fd85151facc46256cec27cfe761f
BLAKE2b-256 dba85302df454773f5b6a9dc9c29d540b3195f8d10ea47773bf5ba2fe934c065

See more details on using hashes here.

File details

Details for the file pyhmmer-0.7.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for pyhmmer-0.7.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 075775d60ce0bf6e232fb2a62956fbaf080a4b3dda08b8222a033155cd3be6c1
MD5 9d4bd75173fd021d3548f940c0be2fee
BLAKE2b-256 5103a89c205cef3d63d140049a204977c5903adbcc2fffcd37f903a155fa5cc8

See more details on using hashes here.

File details

Details for the file pyhmmer-0.7.0-cp39-cp39-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for pyhmmer-0.7.0-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 ae6fcbba94c1d28c16aa44352e7558877517658509b31e4939a4240bfa597f24
MD5 907f89b6b088ac524fdc866eb8c62d91
BLAKE2b-256 f7afdc4680fa932f2bf899f08438eb08bd271f2602454a0a0a8a2fede8bd2abe

See more details on using hashes here.

File details

Details for the file pyhmmer-0.7.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for pyhmmer-0.7.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 39fd73e723d98e0240d198f5036f336f881f3d789b483ba3d858105e3fb37678
MD5 a482df686304585bc95d856b09b78617
BLAKE2b-256 12df5d0e476aab5fa0fa99313e9f2c1d641690381724e8d691a4fe42b7734b6e

See more details on using hashes here.

File details

Details for the file pyhmmer-0.7.0-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for pyhmmer-0.7.0-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 2f3fe8224d041034db528b7c5ee4684a37f23e1c5c13ed0e7f4a370ed0eb419f
MD5 aca01393c367a6cd41ccb504f86fe404
BLAKE2b-256 46cd2ae79c367d9ac8df13d62cd7a7c33f8ae37c4b7084886825e616862fc7c3

See more details on using hashes here.

File details

Details for the file pyhmmer-0.7.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for pyhmmer-0.7.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 8ba977f17e972ee3f5a511c5f23ae2be9746fcf66f04b7ce31d037be9da5083d
MD5 27a40c1a6817ab8e5e50e69e620ab70b
BLAKE2b-256 d7045277c738b8789802cf308662e7eb69535a5dc04d77088937cf749173c7e2

See more details on using hashes here.

File details

Details for the file pyhmmer-0.7.0-cp37-cp37m-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for pyhmmer-0.7.0-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 683306d8e37eadbc24daea6c47c1e207e4acc3df7febafe6ef797415ffde827e
MD5 00a0438842d086ab45aa6814ac1653a0
BLAKE2b-256 4f54985dcf1e22569b0904bd17b1f17e713dc94a7ab13b767c9f5d90f54daba1

See more details on using hashes here.

File details

Details for the file pyhmmer-0.7.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl.

File metadata

File hashes

Hashes for pyhmmer-0.7.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm Hash digest
SHA256 fbd3909606b4ef2f2e21341a4df41b9827062a5b02d4d0f29b9f8a8793c426c6
MD5 ec56206ec268b6a3a3f60756a2e053d9
BLAKE2b-256 e5f9c669b9adabe8996277e8a2bb6f08568509ea95cadfcb76121150d24b5f93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page