Skip to main content

pyfastx is a python module for fast random access to sequences from plain and gzipped FASTA file

Project description

Travis CI Appveyor CI Readthedocs Codecov Coveralls PyPI Version Python Version Wheel

a robust python module for fast random access to sequences from plain and gzipped FASTA file

About

The pyfastx is a lightweight Python C extension that enables users to randomly access to sequences from plain and gzipped FASTA files. This module aims to provide simple APIs for users to extract seqeunce from FASTA by identifier and index number. The pyfastx will build indexes stored in a sqlite3 database file for random access to avoid consuming excessive amount of memory. In addition, the pyfastx can parse standard (sequence spread into multiple lines with same length) and nonstandard (lines with different length) FASTA format. This module used kseq.h written by @attractivechaos in klib project to parse plain FASTA file and zran.c written by @pauldmccarthy in project indexed_gzip to index gzipped file for random access.

This project was heavily inspired by @mdshw5’s project pyfaidx and @brentp’s project pyfasta.

Installation

Make sure you have both pip and at least version 3.5 of Python before starting.

You can install pyfastx via the Python Package Index (PyPI)

pip install pyfastx

Update pyfastx module

pip install -U pyfastx

Usage

Read FASTA file

The fastest way to parse flat or gzipped FASTA file without building index.

>>> import pyfastx
>>> for name, seq in pyfastx.Fasta('test/data/test.fa.gz', build_index=False):
>>>     print(name, seq)

Read flat or gzipped FASTA file and build index, support for random access to FASTA.

>>> import pyfastx
>>> fa = pyfastx.Fasta('test/data/test.fa.gz')
>>> fa
<Fasta> test/data/test.fa.gz contains 211 seqs

Get FASTA information

>>> # get sequence counts in FASTA
>>> len(fa)
211

>>> # get total sequence length of FASTA
>>> fa.size
86262

>>> # get GC content of DNA sequence of FASTA
>>> fa.gc_content
43.529014587402344

>>> # get composition of nucleotides in FASTA
>>> fa.composition
{'A': 24534, 'C': 18694, 'G': 18855, 'T': 24179, 'N': 0}

Get sequence from FASTA

>>> # get sequence like a dictionary by identifier
>>> s1 = fa['JZ822577.1']
>>> s1
<Sequence> JZ822577.1 with length of 333

>>> # get sequence like a list by index
>>> s2 = fa[2]
>>> s2
<Sequence> JZ822579.1 with length of 176

>>> # get last sequence
>>> s3 = fa[-1]
>>> s3
<Sequence> JZ840318.1 with length of 134

>>> # check a sequence name weather in FASTA file
>>> 'JZ822577.1' in fa
True

Get sequence information

>>> s = fa[-1]
>>> s
<Sequence> JZ840318.1 with length of 134

>>> # get sequence name
>>> s.name
'JZ840318.1'

>>> # get sequence string
>>> s.seq
'ACTGGAGGTTCTTCTTCCTGTGGAAAGTAACTTGTTTTGCCTTCACCTGCCTGTTCTTCACATCAACCTTGTTCCCACACAAAACAATGGGAATGTTCTCACACACCCTGCAGAGATCACGATGCCATGTTGGT'

>>> # get sequence length
>>> len(s)
134

>>> # get GC content if dna sequence
>>> s.gc_content
46.26865768432617

>>> # get nucleotide composition if dna sequence
>>> s.composition
{'A': 31, 'C': 37, 'G': 25, 'T': 41, 'N': 0}

Sequence slice

Sequence object can be sliced like a python string

>>> # get a sub seq from sequence
>>> ss = seq[10:30]
>>> ss
<Sequence> JZ840318.1 from 11 to 30

>>> ss.name
'JZ840318.1:11-30'

>>> ss.seq
'CTTCTTCCTGTGGAAAGTAA'

>>> ss = s[-10:]
>>> ss
<Sequence> JZ840318.1 from 125 to 134

>>> ss.name
'JZ840318.1:125-134'

>>> ss.seq
'CCATGTTGGT'

Reverse and complement sequence

>>> # get sliced sequence
>>> fa[0][10:20].seq
'GTCAATTTCC'

>>> # get reverse of sliced sequence
>>> fa[0][10:20].reverse
'CCTTTAACTG'

>>> # get complement of sliced sequence
>>> fa[0][10:20].complement
'CAGTTAAAGG'

>>> # get reversed complement sequence, corresponding to sequence in antisense strand
>>> fa[0][10:20].antisense
'GGAAATTGAC'

Get subsequences

Subseuqneces can be retrieved from FASTA file by using a list of [start, end] coordinates

>>> # get subsequence with start and end position
>>> interval = (1, 10)
>>> fa.fetch('JZ822577.1', interval)
'CTCTAGAGAT'

>>> # get subsequences with a list of start and end position
>>> intervals = [(1, 10), (50, 60)]
>>> fa.fetch('JZ822577.1', intervals)
'CTCTAGAGATTTTAGTTTGAC'

>>> # get subsequences with reverse strand
>>> fa.fetch('JZ822577.1', (1, 10), strand='-')
'ATCTCTAGAG'

Get identifiers

Get all identifiers of sequence as a list-like object.

>>> ids = fa.keys()
>>> ids
<Identifier> contains 211 identifiers

>>> # get count of sequence
>>> len(ids)
211

>>> # get identifier by index
>>> ids[0]
'JZ822577.1'

>>> # check identifier where in fasta
>>> 'JZ822577.1' in ids
True

>>> # iter identifiers
>>> for name in ids:
>>>     print(name)

>>> # convert to a list
>>> list(ids)

Testing

The pyfaidx module was used to test pyfastx. To run the tests:

$ python setup.py test

Acknowledgements

kseq.h and zlib was used to parse FASTA format. Sqlite3 was used to store built indexes. pyfastx can randomly access to sequences from gzipped FASTA file mainly attributed to indexed_gzip.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfastx-0.2.8.tar.gz (35.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyfastx-0.2.8-cp37-cp37m-win_amd64.whl (501.0 kB view details)

Uploaded CPython 3.7mWindows x86-64

pyfastx-0.2.8-cp37-cp37m-win32.whl (514.5 kB view details)

Uploaded CPython 3.7mWindows x86

pyfastx-0.2.8-cp37-cp37m-manylinux1_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.7m

pyfastx-0.2.8-cp37-cp37m-manylinux1_i686.whl (2.0 MB view details)

Uploaded CPython 3.7m

pyfastx-0.2.8-cp37-cp37m-macosx_10_6_intel.whl (91.9 kB view details)

Uploaded CPython 3.7mmacOS 10.6+ Intel (x86-64, i386)

pyfastx-0.2.8-cp36-cp36m-win_amd64.whl (501.0 kB view details)

Uploaded CPython 3.6mWindows x86-64

pyfastx-0.2.8-cp36-cp36m-win32.whl (514.6 kB view details)

Uploaded CPython 3.6mWindows x86

pyfastx-0.2.8-cp36-cp36m-manylinux1_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.6m

pyfastx-0.2.8-cp36-cp36m-manylinux1_i686.whl (2.0 MB view details)

Uploaded CPython 3.6m

pyfastx-0.2.8-cp36-cp36m-macosx_10_6_intel.whl (91.9 kB view details)

Uploaded CPython 3.6mmacOS 10.6+ Intel (x86-64, i386)

pyfastx-0.2.8-cp35-cp35m-win_amd64.whl (501.0 kB view details)

Uploaded CPython 3.5mWindows x86-64

pyfastx-0.2.8-cp35-cp35m-win32.whl (514.7 kB view details)

Uploaded CPython 3.5mWindows x86

pyfastx-0.2.8-cp35-cp35m-manylinux1_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.5m

pyfastx-0.2.8-cp35-cp35m-manylinux1_i686.whl (2.0 MB view details)

Uploaded CPython 3.5m

pyfastx-0.2.8-cp35-cp35m-macosx_10_6_intel.whl (91.9 kB view details)

Uploaded CPython 3.5mmacOS 10.6+ Intel (x86-64, i386)

File details

Details for the file pyfastx-0.2.8.tar.gz.

File metadata

  • Download URL: pyfastx-0.2.8.tar.gz
  • Upload date:
  • Size: 35.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.3

File hashes

Hashes for pyfastx-0.2.8.tar.gz
Algorithm Hash digest
SHA256 5b5228b0b634c3e5da1cf9391abb5cd1ee1498171a43c711a0803caca7b7222a
MD5 c752df2d37e587a07fa2617caf24b146
BLAKE2b-256 7a658c851deac0c206de344b8248c7cb84f910f06415c8d5b436c11dd353086e

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 501.0 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for pyfastx-0.2.8-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 f46ace30ecdaf711248ca263f2ed85af94bed2f3d8c39c56e4514e3b4a2ad28a
MD5 dbea58cddbcb0897bfa7b84530f405a0
BLAKE2b-256 3d69f3ca8bdb1dd452480baa4f46aba1a519e1fb05220ea34316fe9d48bdb543

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp37-cp37m-win32.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 514.5 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for pyfastx-0.2.8-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 a63d244ee4f33ad8daf61e7e99f406d3104d06ac4e3606139f61ef8e183062d5
MD5 d414566e7c49b8d3e57f16e462071d1c
BLAKE2b-256 29e5881e081643976e194e584648afa9f02d3481b55646f70e2fb867e8339ca4

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.3

File hashes

Hashes for pyfastx-0.2.8-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 a960b0ec7158ca63a69c01dba9fe99b421ee7dc100aa1818d239282d29ec9129
MD5 7862084b106ac6bbc89c980324f6d31f
BLAKE2b-256 e3cfb099f45ec8c6b74e50915ea8be660d76b11cd94b83f8d76e855e40db1985

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp37-cp37m-manylinux1_i686.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp37-cp37m-manylinux1_i686.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.3

File hashes

Hashes for pyfastx-0.2.8-cp37-cp37m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 0be6a354179f27864435277c1aa21fe4d010df71f25ed02e2bde4da2655c59de
MD5 0a1fd7edb8d6ca41b4f615187cf45440
BLAKE2b-256 bd80ca4cae84f05c3cfb2b04bb033deefe77ad879c81a282ceccad3f0c09a965

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp37-cp37m-macosx_10_6_intel.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp37-cp37m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 91.9 kB
  • Tags: CPython 3.7m, macOS 10.6+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/2.7.15

File hashes

Hashes for pyfastx-0.2.8-cp37-cp37m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 2433629feee5b9252bef6cb3296a1dc72688c53f2275b962518382def5f4a754
MD5 4c478ba56344f2df736a0dded2532bda
BLAKE2b-256 419e65ac481111fa4547e8c9e87698dc7b4d643a1dcbaf5c7ca2020041b3ec22

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 501.0 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.8

File hashes

Hashes for pyfastx-0.2.8-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 fd51918fd6e7832530d95cff1a29ac711462b53345e180d6ed8b1c68caa35da1
MD5 55ab26b469a86700fcb2bd537a99b484
BLAKE2b-256 c5ceb79fc8a992a286b12ea1f4e2e3eaf1e69aad235eb698819a46697790cd80

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp36-cp36m-win32.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 514.6 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.8

File hashes

Hashes for pyfastx-0.2.8-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 dbc1cf028c7e651b9a5a20d6f895567b86cacc671a9da2ac7a401b9109fad0fe
MD5 dc041c8c23da9810b32c5318677bdb92
BLAKE2b-256 eaedd8ce6fc3a65c1ef633793e0c5bf7dc5939355306aac8de1243b496b586ff

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.3

File hashes

Hashes for pyfastx-0.2.8-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9e293d16d9f9aa195ed4be71677b1c73a73a6dff71fccd208094ba6cdfe56883
MD5 e1467dfc107fcd85d15396f7b4d2167b
BLAKE2b-256 041e28a724061236234f88b91673ea1a19d4ea5e408820c45002a53f76e463d8

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp36-cp36m-manylinux1_i686.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp36-cp36m-manylinux1_i686.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.3

File hashes

Hashes for pyfastx-0.2.8-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 ae676e798b7bfb2168c8361443856132e6fae5c0a262a2598fef46a3212fb462
MD5 aa6b34777a49c496ebd4ec6daffc62f0
BLAKE2b-256 4851f53e6edb27601bb22fca9de6001e84cab0ca3e827946d730c7ceb70e12d2

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp36-cp36m-macosx_10_6_intel.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp36-cp36m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 91.9 kB
  • Tags: CPython 3.6m, macOS 10.6+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/2.7.15

File hashes

Hashes for pyfastx-0.2.8-cp36-cp36m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 8b0a2a749a655349df25aa0fad1431be026983d0d5a6e96e1830269daef1b32c
MD5 11296e1f92d74962ee11a7ed6e84a857
BLAKE2b-256 c4309b2f446e78f563e7b5a366fed4e731b05020840496cd1a3c591638af3ff0

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 501.0 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.5.4

File hashes

Hashes for pyfastx-0.2.8-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 1d145218bc043a069378778f9b652525b8486f06c1bacd76a51fc7b08b3f0532
MD5 c590c1993a271091e7b6cb0cb03de01d
BLAKE2b-256 fb0f3667d454eab85a593d1fdff280b99924eb12a2ab246512218dd7848792e2

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp35-cp35m-win32.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 514.7 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.5.4

File hashes

Hashes for pyfastx-0.2.8-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 4b9ec9b0285a4450f626de483a95f31d3441d906ee2d09218bcff7a006df3c9a
MD5 a1b9ef679cfd066285a9e355f565f0e8
BLAKE2b-256 ccb4831287afb6ab93184df046beae0fcfc2f7ca136d25098de981bf3f939957

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.3

File hashes

Hashes for pyfastx-0.2.8-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ae7902fad67936764a93c429c1020b78437a0d683d28f17596f83a4be0b7807e
MD5 94b2dcde7373ef0459a9bf5c756f5576
BLAKE2b-256 c1fcce79533337638702e595f25b5320e9ddb39df24ca5f9d518df78805871c7

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp35-cp35m-manylinux1_i686.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp35-cp35m-manylinux1_i686.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.3

File hashes

Hashes for pyfastx-0.2.8-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 bff3296745873b5484dd1c25eb178f2095bc3f0168e36d4d1073618e4e925b65
MD5 627c91b4d1b16debff4f13187ed27db0
BLAKE2b-256 039a68856289c87d5d83493804adda2d0dc5315169eebcf516b19a6939923ace

See more details on using hashes here.

File details

Details for the file pyfastx-0.2.8-cp35-cp35m-macosx_10_6_intel.whl.

File metadata

  • Download URL: pyfastx-0.2.8-cp35-cp35m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 91.9 kB
  • Tags: CPython 3.5m, macOS 10.6+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/2.7.15

File hashes

Hashes for pyfastx-0.2.8-cp35-cp35m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 801dd9a72d5b600c86c7dd2370371267c7822b85b8af4066a318c3da5b1821d3
MD5 589a09a148fff2b53289e239af2b9e46
BLAKE2b-256 002b1c244fba9040b82aaac9942bab3099af97d3f58626b1c72d95ab6bb92a9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page