Skip to main content

fast vcf parsing with cython + htslib

Project description

cyvcf2

Note: cyvcf2 versions < 0.20.0 require htslib < 1.10. cyvcf2 versions >= 0.20.0 require htslib >= 1.10

The latest documentation for cyvcf2 can be found here:

Docs

If you use cyvcf2, please cite the paper

Fast python (2 and 3) parsing of VCF and BCF including region-queries.

Build Status

cyvcf2 is a cython wrapper around htslib built for fast parsing of Variant Call Format (VCF) files.

Attributes like variant.gt_ref_depths work for diploid samples and return a numpy array directly so they are immediately ready for downstream use. note that the array is backed by the underlying C data, so, once variant goes out of scope. The array will contain nonsense. To persist a copy, use: cpy = np.array(variant.gt_ref_depths) instead of just arr = variant.gt_ref_depths.

Example

The example below shows much of the use of cyvcf2.

from cyvcf2 import VCF

for variant in VCF('some.vcf.gz'): # or VCF('some.bcf')
    variant.REF, variant.ALT # e.g. REF='A', ALT=['C', 'T']

    variant.CHROM, variant.start, variant.end, variant.ID, \
                variant.FILTER, variant.QUAL

    # numpy arrays of specific things we pull from the sample fields.
    # gt_types is array of 0,1,2,3==HOM_REF, HET, UNKNOWN, HOM_ALT
    variant.gt_types, variant.gt_ref_depths, variant.gt_alt_depths # numpy arrays
    variant.gt_phases, variant.gt_quals, variant.gt_bases # numpy array

    ## INFO Field.
    ## extract from the info field by it's name:
    variant.INFO.get('DP') # int
    variant.INFO.get('FS') # float
    variant.INFO.get('AC') # float

    # convert back to a string.
    str(variant)


    ## sample info...

    # Get a numpy array of the depth per sample:
    dp = variant.format('DP')
    # or of any other format field:
    sb = variant.format('SB')
    assert sb.shape == (n_samples, 4) # 4-values per

# to do a region-query:

vcf = VCF('some.vcf.gz')
for v in vcf('11:435345-556565'):
    if v.INFO["AF"] > 0.1: continue
    print(str(v))

Installation

pip (assuming you have htslib < 1.10 installed)

pip install cyvcf2

github (building htslib and cyvcf2 from source)

git clone --recursive https://github.com/brentp/cyvcf2
cd cyvcf2/htslib
autoheader
autoconf
./configure --enable-libcurl
make

cd ..
pip install -r requirements.txt
CYTHONIZE=1 pip install -e .

On OSX, using brew, you may have to set the following as indicated by the brew install:

For compilers to find openssl you may need to set:
  export LDFLAGS="-L/usr/local/opt/openssl/lib"
  export CPPFLAGS="-I/usr/local/opt/openssl/include"

For pkg-config to find openssl you may need to set:
  export PKG_CONFIG_PATH="/usr/local/opt/openssl/lib/pkgconfig"

Testing

Install pytest, then tests can be run with:

pytest

CLI

Run with cyvcf2 path_to_vcf

$ cyvcf2 --help
Usage: cyvcf2 [OPTIONS] <vcf_file> or -

  fast vcf parsing with cython + htslib

Options:
  -c, --chrom TEXT                Specify what chromosome to include.
  -s, --start INTEGER             Specify the start of region.
  -e, --end INTEGER               Specify the end of the region.
  --include TEXT                  Specify what info field to include.
  --exclude TEXT                  Specify what info field to exclude.
  --loglevel [DEBUG|INFO|WARNING|ERROR|CRITICAL]
                                  Set the level of log output.  [default:
                                  INFO]
  --silent                        Skip printing of vcf.
  --help                          Show this message and exit.

See Also

Pysam also has a cython wrapper to htslib and one block of code here is taken directly from that library. But, the optimizations that we want for gemini are very specific so we have chosen to create a separate project.

Performance

For the performance comparison in the paper, we used thousand genomes chromosome 22 With the full comparison runner here.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cyvcf2-0.30.25.tar.gz (1.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cyvcf2-0.30.25-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.25-cp312-cp312-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cyvcf2-0.30.25-cp312-cp312-macosx_10_9_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.12macOS 10.9+ x86-64

cyvcf2-0.30.25-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.25-cp311-cp311-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cyvcf2-0.30.25-cp311-cp311-macosx_10_9_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

cyvcf2-0.30.25-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.25-cp310-cp310-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

cyvcf2-0.30.25-cp310-cp310-macosx_10_9_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

cyvcf2-0.30.25-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.25-cp39-cp39-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

cyvcf2-0.30.25-cp39-cp39-macosx_10_9_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

cyvcf2-0.30.25-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.5 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

cyvcf2-0.30.25-cp38-cp38-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

cyvcf2-0.30.25-cp38-cp38-macosx_10_9_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

cyvcf2-0.30.25-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.2 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

cyvcf2-0.30.25-cp37-cp37m-macosx_10_9_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

File details

Details for the file cyvcf2-0.30.25.tar.gz.

File metadata

  • Download URL: cyvcf2-0.30.25.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for cyvcf2-0.30.25.tar.gz
Algorithm Hash digest
SHA256 40f1fda66e3bad9873bba08fe3c0839f590089137b2aa7070247561e77ce0d8d
MD5 be13e18e3e3b922d27eac9be0da6b202
BLAKE2b-256 7a44985785e046c1968d61fdc952c36250c250514c7e36daea7291c2f7baa849

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1946a269a9b0f0d917e5027e96941e6d828e09b6ae6ec47d2d29b31b9a5c7ab9
MD5 da876bca935c28ea680b86fdd47ea6ad
BLAKE2b-256 87b3fb9f04504da5b1c77793f85afb785b7275fab6ee8a75c2b11bf1472a86d2

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5fc87b24c9340a663ce07ccaee548fc45c74714f598071b540d14c47a78babac
MD5 b07ba19d3ae67a763f55161011e3dfa9
BLAKE2b-256 f946901999d6e1f23692aaf7ce0f1e1f19b698ffd9b95cf07dfde198beb6c33c

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f421e4f05c29dcd79cd946192d6bf7767842e6f7952bf9d52814805f5fe53e4c
MD5 635e5537a436bcc996fbc8632311ff65
BLAKE2b-256 729b9f4ef25ac7382ed45d2bf87614a82f3e2d528d6312ec2a2552ac367dd0b7

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5343a4f0b18499c14dfd2e1f2ead17db48790bf41334c42ef79dcc789d771af1
MD5 d22d50a43c2cdbb4a0a4c6326e8c6c7d
BLAKE2b-256 e638313a390234c6cdc7f4b140bbe57cfbe76fd20e5867defbaf359a1e3f9cce

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 860081c0bc37ce2aacb841935841a1b1339a1beb0738115ab44bb7ad7e1bffbc
MD5 87eadedf9e0b3525563bf06ca92b4bf5
BLAKE2b-256 7d34c928208885303142a8b872648a93a800ef83ccc61672a31645cccbe604d7

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d5d825af3c6ee8bcbfc7c4453e2686104662be3eec4719352dae5024efb5a95a
MD5 f898d695226af73643da1d897c346012
BLAKE2b-256 f858e5d63fd50fb62a2692ee9a2040a9a1fe65b5ace5a0183df8aa21fe223b8f

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 eae4f7b3c70b315b5eee31413fa112e76274aaf9f70f1d0cc7cd164946026b8b
MD5 e6313d1b80f5e368b508333487eeac27
BLAKE2b-256 ee127b2f4caababbb7c4d0298581536e74cee0e49d743fd7d6c878ae916362ce

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 90d25f9fe8bc67c213cd329d9966e7b0a8735e6c16c7ca2d320c612494201cc0
MD5 1b6e3bf7cd96d3faf4d56a4b24532343
BLAKE2b-256 4282984e91c9aa7fb4ecd61e2c1f93f592b46fd3146d62d95f57785f4c65a1bb

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 779fff96b426fe4819ddffde6dc21fad56ff3de3018b5610c50a9306eedf3f89
MD5 1bc3f183f552dac297037f1ffe73f3cc
BLAKE2b-256 28f75ec2bac10ce895cbcb883591053426db0567352dc169004ac81fda15d91f

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 21a14d4bc40b9fa946710fb3fe4aeb8b83808be0c34162146573393c7dedd136
MD5 3d708cba4eec5ab2b3e7dadf35d045a3
BLAKE2b-256 40efd4ff406cc5ee52e25fc3da7a4b501793c67c9b2339aecbc833b2ad117150

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 17ee73ae176438315580e2fa10a22aaa983c1e7f4467bf5ab98345bea4ca465d
MD5 b49d5b876dd744f34af2a65125ead1f8
BLAKE2b-256 c59b32f84245b2908b7e1e705226c0db57e857d00bd4aa1d05966f2df95954b4

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 b2b119332499c5d0288abba366bcdf2c282c37b1af15bfc1081252a93e0ca552
MD5 b0fd8fa6f305fecddcd16169a15e30ff
BLAKE2b-256 8405b3724b7949f8e55cad574f0fd05cbbdb5a9c2b8d10b6c3a8a5ebc5b14cd2

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 712bf82fdd6c0d06fab7d8e319a7721d106e640333cab160ef2c0af950a424dc
MD5 aac1f73cd2e4c1e2534aa9c07a95698e
BLAKE2b-256 75d0e31cf4ad10478b5b5ae296cf1e0e536ab5ee19099c53b39b0a325a8174ac

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0dd0193f42b201932091a6e13814bdf43ba6e0b6ed06cb349965e36dc8c17da4
MD5 cb21acf66fc3f8d8c5e481e8f73daf76
BLAKE2b-256 f54359c6d56763bde524e34adb324136becd108c974445dde36aa6560601c03f

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3252e884c6c5fc3f5685479698f61a11115ba689928deac9317b363897360849
MD5 d2ae3f22176d519ccb30bef61e9cf0c1
BLAKE2b-256 3408f581b33ada2c5d90b8652a36e2fa1215c42a497221d141ba32b3737a341b

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 81d2c43f645c0b7bd3c8984913a221d6f04e1bdf5e3434a5847c6ac489385909
MD5 c142547abeee780d5aaa315a9f57b4ce
BLAKE2b-256 a8a8a2502efd36ff2aaaf6cbed1e7fbbc6c34a2ef92e6b4601f2ea582d594c26

See more details on using hashes here.

File details

Details for the file cyvcf2-0.30.25-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for cyvcf2-0.30.25-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f7ff648334c0f0ddab9d2b3a201e53242d5f721405d96b56a46cd7222ca7fd99
MD5 ed6ca51cfbf1bfb9da80743ff733e7a9
BLAKE2b-256 d493ceb6b93e5384dc49529607a54fe359ea82364d567d566a7ed612242bac1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page