Skip to main content

Package for loading data from bgen files

Project description

Another bgen reader

bgen

This is a package for reading bgen files.

This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?

This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).

Install

pip install bgen

Usage

from bgen.reader import BgenFile

bfile = BgenFile(BGEN_PATH)
rsids = bfile.rsids()

# select a variant by indexing
var = bfile[1000]

# pull out genotype probabilities
probs = var.probabilities  # returns 2D numpy array
dosage = var.minor_allele_dosage  # returns 1D numpy array for biallelic variant

# iterate through every variant in the file
with BgenFile(BGEN_PATH, delay_parsing=True) as bfile:
  for var in bfile:
      dosage = var.minor_allele_dosage

# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)

API documentation

class BgenFile(path, sample_path='', delay_parsing=False)
    # opens a bgen file. If a bgenix index exists for the file, the index file
    # will be opened automatically for quicker access of specific variants.
    Arguments:
      path: path to bgen file
      sample_path: optional path to sample file. Samples will be given integer IDs
          if sample file is not given and sample IDs not found in the bgen file
      delay_parsing: True/False option to allow for not loading all variants into
          memory when the BgenFile is opened. This can save time when iterating
          across variants in the file
  
  Attributes:
    samples: list of sample IDs
    header: BgenHeader with info about the bgen version and compression.
  
  Methods:
    slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
    iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
    fetch(chrom, start=None, stop=None): get all variants within a genomic region
    drop_variants(list[int]): drops variants by index from being used in analyses
    with_rsid(pos): returns BgenVar with given position
    at_position(rsid): returns BgenVar with given rsid
    varids(): returns list of varids for variants in the bgen file.
    rsids(): returns list of rsids for variants in the bgen file.
    chroms(): returns list of chromosomes for variants in the bgen file.
    positions(): returns list of positions for variants in the bgen file.

class BgenVar(handle, offset, layout, compression, n_samples):
  # Note: this isn't called directly, but instead returned from BgenFile methods
  Attributes:
    varid: ID for variant
    rsid: reference SNP ID for variant
    chrom: chromosome variant is on
    pos: nucleotide position variant is at
    alleles: list of alleles for variant
    is_phased: True/False for whether variant has phased genotype data
    ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
    minor_allele: the least common allele (for biallelic variants)
    minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
    probabilitiies:  2D numpy array of genotype probabilities, one sample per row
  
  BgenVars can be pickled e.g. pickle.dumps(var)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bgen-1.2.17.tar.gz (666.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bgen-1.2.17-cp310-cp310-musllinux_1_1_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10musllinux: musl 1.1+ x86-64

bgen-1.2.17-cp310-cp310-musllinux_1_1_i686.whl (2.2 MB view details)

Uploaded CPython 3.10musllinux: musl 1.1+ i686

bgen-1.2.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

bgen-1.2.17-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.12+ i686manylinux: glibc 2.17+ i686

bgen-1.2.17-cp39-cp39-musllinux_1_1_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.9musllinux: musl 1.1+ x86-64

bgen-1.2.17-cp39-cp39-musllinux_1_1_i686.whl (2.2 MB view details)

Uploaded CPython 3.9musllinux: musl 1.1+ i686

bgen-1.2.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

bgen-1.2.17-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.12+ i686manylinux: glibc 2.17+ i686

bgen-1.2.17-cp38-cp38-musllinux_1_1_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8musllinux: musl 1.1+ x86-64

bgen-1.2.17-cp38-cp38-musllinux_1_1_i686.whl (2.3 MB view details)

Uploaded CPython 3.8musllinux: musl 1.1+ i686

bgen-1.2.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

bgen-1.2.17-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.6 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ i686manylinux: glibc 2.17+ i686

bgen-1.2.17-cp37-cp37m-musllinux_1_1_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.7mmusllinux: musl 1.1+ x86-64

bgen-1.2.17-cp37-cp37m-musllinux_1_1_i686.whl (2.2 MB view details)

Uploaded CPython 3.7mmusllinux: musl 1.1+ i686

bgen-1.2.17-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

bgen-1.2.17-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.6 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ i686manylinux: glibc 2.17+ i686

bgen-1.2.17-cp36-cp36m-musllinux_1_1_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.6mmusllinux: musl 1.1+ x86-64

bgen-1.2.17-cp36-cp36m-musllinux_1_1_i686.whl (2.2 MB view details)

Uploaded CPython 3.6mmusllinux: musl 1.1+ i686

bgen-1.2.17-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

bgen-1.2.17-cp36-cp36m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.6 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ i686manylinux: glibc 2.17+ i686

File details

Details for the file bgen-1.2.17.tar.gz.

File metadata

  • Download URL: bgen-1.2.17.tar.gz
  • Upload date:
  • Size: 666.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for bgen-1.2.17.tar.gz
Algorithm Hash digest
SHA256 9bde98a3b77a265ec2b3fd92af805cfa3e044e367c33a9fe655d7bfbdccb892e
MD5 75d9d7292004009d88fff29da8fe1d45
BLAKE2b-256 ea1b3a2d461dc1f19c95318f41d86d9fbd9a65bba681035a129c491bbbfb2fea

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 99a5f5a3523c1b466431f4bf8209fc9a77035b08fdcd882d978104fa457f0bdb
MD5 8dd4dd7123d1d8707cdd0ab8465a2f0a
BLAKE2b-256 adcf5c19e40d9ed626b17dd44b881bbcc9da2976268afd812e44352a2dfe53d6

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 6aeedc7a8995ce332b37e54404288fe366ee54102f667684bc96c92319403ab1
MD5 c0423a08695c65d3ad9beee2e9e85e1b
BLAKE2b-256 f297e4864f6425ed90b6eb099eccbc68a63db2aa5c31105126b891bd2eb2f4db

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 235066f27f6a9d0e0e882a554f7ad797c42a06669953dca2dd3cdf77fffcf250
MD5 1047f5a2599d77b5a84d09ae6a63d5ae
BLAKE2b-256 7f0c87c5886e79e641fef3363af5bd0376637017ac8333eecb378e9ae2d1a59f

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 33db928cc3314a41948d6d970806e0c5c2457ce7ae611d284a728a62ad71de38
MD5 bc83ce90fb685ba94f4a81a57de8c21d
BLAKE2b-256 66c0cf1c29f02ebfba502b61a0d61f71e7cd2aed3f567d9dfd8a14a6d8c8ecd4

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 169754a2df6bad2fe1e39bad8f7138ab877bc74bda0e2fca575a2e96252a0b1d
MD5 bad1ec3fd2dee26df0a2099e903f8709
BLAKE2b-256 01a8ae3f5e8cc0a01e8476f1066abd3b0dce3026227ec27a54578fe6425f88ee

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp39-cp39-musllinux_1_1_i686.whl.

File metadata

  • Download URL: bgen-1.2.17-cp39-cp39-musllinux_1_1_i686.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: CPython 3.9, musllinux: musl 1.1+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for bgen-1.2.17-cp39-cp39-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 6b91210325360989c0b4efd556926460193c18147f775aa56cad479ccaa47516
MD5 086b2ce647c052cbd6753cb36692fdf3
BLAKE2b-256 9867e70b81ce3d0d7d5b631d16a5db606dadd3caa5293ea36d838be5541fae8b

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 20094a25f19e4f72cf587b2fe23de34ab88eed1f3880638bfcca98c0dc24ec2c
MD5 0d1123abbc8500260867647637a296e8
BLAKE2b-256 898e5f7819283679aae7d0b8672a52ec4d06187a6ea90e76543f0113dd7ea5da

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 c532c505e7f90fb4e21514fc029a45b9b400f26fecd160d8bb251f4e24b280b1
MD5 22409c45f374d1af5066ee46314816e8
BLAKE2b-256 03e174796159bf8c9dd70693639a3f4771bec96bfc5c57bcbf006bf39be1095d

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 d04109251bdf5104edfaa854d668103421d1f49e9f21afdc5b018aeac07c35af
MD5 d33f2e3dc467c9ea4102080a992f31c7
BLAKE2b-256 53d65bfb0cfcf27567b11f9b7e49b49deae374ad99770853e35a42febd713ebd

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp38-cp38-musllinux_1_1_i686.whl.

File metadata

  • Download URL: bgen-1.2.17-cp38-cp38-musllinux_1_1_i686.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: CPython 3.8, musllinux: musl 1.1+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for bgen-1.2.17-cp38-cp38-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 1122c8140a9d2860c0c93d568b6a44cc159cd4b932fe07fedf35f610e544ca3f
MD5 cd59ea9d6ac4da88247e5b4a64b68664
BLAKE2b-256 36b0633d75e9f3c5c1ba9330bf8e027e028a08fb90938b56072dd9d466f2753c

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 83ee3e021b1f57f29b68b37e8284f2c0751111d0ad1a5830a12f058ab227ad2d
MD5 c7b51d9d77d034f49912f57047416c79
BLAKE2b-256 01e84c345363057b654f2c79381842e6f31d4fa0367ac358214c5c202101476a

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 b7edbf90fc0758b7d2d7b8c8e037d6db6f57c726b8fe0fc470cb5db9f5791672
MD5 8b0e2f0e2e3894efc7129bf5976c5bdd
BLAKE2b-256 e36755a8645f67d119a00597407e8ef7859fc4624a0966c138c3565547dcc30f

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp37-cp37m-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 75c8c8c70fc92d6a1a5ab0aff945e59134040d437d1694b78e11958b5f076738
MD5 fe2d3d5829a7b608efdf647a80a81597
BLAKE2b-256 bb22a56dc54de5b15142b952edf70ff95650cc5ffdf629b3db2035de1c33d3cc

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp37-cp37m-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 eed5548809fa772ce6307e4435f257d46a66b551d7ab1cb1aed61dbbdb3dbce4
MD5 96e36d115448e8075f30654e9f119e55
BLAKE2b-256 38b4fc7dfc6697f8337dba41005c1a8c6f328936506e20e97be22bd1a177c274

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cd70f3b6d256beb26c0c151c85d9562bc3a2e3b7f67839cab0fce732d2bd4057
MD5 a74f8553e5f988282219c5c5294b83ae
BLAKE2b-256 c558dfa1fab96b2f23976419d887996ed293342596401dfe1c8ebfe788bc8050

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 f4f15b6acd75285513bcb4eb818348e77f4fc2341b77881099b57b15a1d3919f
MD5 b3f7bc9fba406853a0cfa661bf36bb6e
BLAKE2b-256 79663968bb85a89b416dc93a86e0e61d25d95d59635565e22e835678370993a8

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp36-cp36m-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp36-cp36m-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 3b5b3774e0b0360de8200574bf1adabdf098c3e377bc992454db04d774f2b42d
MD5 357b3613432575e9fe2a9779ab04b55e
BLAKE2b-256 391994faa26671a116653eceb60bd88c43e2f3cc257db22dfbbb8b9c1d2fb17c

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp36-cp36m-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp36-cp36m-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 0b73a6c40e09ff278bde4098727d8e2d0ba234408249d2cd6ac2ce7638de2ac1
MD5 dd41158ead7e64a9bd8e2513ccbdf42f
BLAKE2b-256 1a9b14cfbe141fb5566f5c60dc7018710af6b2b3601ce5271359d88b3125e0e0

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bb57431cf82ce5d18398b7f5d08f071ab40de9818b2af5717544fab5d0ad5fff
MD5 376bd12cb7cffc24a836cd6ccea1b35c
BLAKE2b-256 e02cd879e7046e57fbb7c647deaf70b7631fa3b3d5d6852d07e0efa59f2f5da6

See more details on using hashes here.

File details

Details for the file bgen-1.2.17-cp36-cp36m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for bgen-1.2.17-cp36-cp36m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 de314e8b2a82655bf27ee5967eb15a772d4564695f403682808e019ba9271268
MD5 ce2bbef42835545f01b7615691c6c8b6
BLAKE2b-256 e2c9fd7896edf65b9b0b306316bfd0f45ef7e91072a87fac831615967cce6adc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page