Package for loading data from bgen files
Project description
Another bgen reader
This is a package for reading bgen files.
This package uses cython to wrap c++ code for parsing bgen files. It's fairly quick, it can parse genotypes from 500,000 individuals at ~300 variants per second within a single python process (~450 million probabilities per second with a 3GHz CPU). Decompressing the genotype probabilities is the slow step, zlib decompression takes 80% of the total time, using zstd compressed genotypes would be much faster, maybe 2-3X faster?
This has been optimized for UKBiobank bgen files (i.e. bgen version 1.2 with zlib compressed 8-bit genotype probabilities, but the other bgen versions and zstd compression have also been tested using example bgen files).
Install
pip install bgen
Usage
from bgen.reader import BgenFile
bfile = BgenFile(BGEN_PATH)
rsids = bfile.rsids()
# select a variant by indexing
var = bfile[1000]
# pull out genotype probabilities
probs = var.probabilities # returns 2D numpy array
dosage = var.minor_allele_dosage # returns 1D numpy array for biallelic variant
# iterate through every variant in the file
with BgenFile(BGEN_PATH, delay_parsing=True) as bfile:
for var in bfile:
dosage = var.minor_allele_dosage
# get all variants in a genomic region
variants = bfile.fetch('21', 10000, 5000000)
API documentation
class BgenFile(path, sample_path='', delay_parsing=False)
# opens a bgen file. If a bgenix index exists for the file, the index file
# will be opened automatically for quicker access of specific variants.
Arguments:
path: path to bgen file
sample_path: optional path to sample file. Samples will be given integer IDs
if sample file is not given and sample IDs not found in the bgen file
delay_parsing: True/False option to allow for not loading all variants into
memory when the BgenFile is opened. This can save time when iterating
across variants in the file
Attributes:
samples: list of sample IDs
header: BgenHeader with info about the bgen version and compression.
Methods:
slicing: BgenVars can be accessed by slicing the BgenFile e.g. bfile[1000]
iteration: variants in a BgenFile can be looped over e.g. for x in bfile: print(x)
fetch(chrom, start=None, stop=None): get all variants within a genomic region
drop_variants(list[int]): drops variants by index from being used in analyses
with_rsid(rsid): returns BgenVar with given position
at_position(pos): returns BgenVar with given rsid
varids(): returns list of varids for variants in the bgen file.
rsids(): returns list of rsids for variants in the bgen file.
chroms(): returns list of chromosomes for variants in the bgen file.
positions(): returns list of positions for variants in the bgen file.
class BgenVar(handle, offset, layout, compression, n_samples):
# Note: this isn't called directly, but instead returned from BgenFile methods
Attributes:
varid: ID for variant
rsid: reference SNP ID for variant
chrom: chromosome variant is on
pos: nucleotide position variant is at
alleles: list of alleles for variant
is_phased: True/False for whether variant has phased genotype data
ploidy: list of ploidy for each sample. Samples are ordered as per BgenFile.samples
minor_allele: the least common allele (for biallelic variants)
minor_allele_dosage: 1D numpy array of minor allele dosages for each sample
alt_dosage: 1D numpy array of alt allele dosages for each sample
probabilitiies: 2D numpy array of genotype probabilities, one sample per row
BgenVars can be pickled e.g. pickle.dumps(var)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for bgen-1.3.1-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cdb7408dc088d4f2ace32915519c606ee62ffa5ceef3e5fd5b813dd0c048292 |
|
MD5 | 2248bb7374f1dfb96a4de1a588227118 |
|
BLAKE2b-256 | 094169196c472b9eb99583564f7ec42690210eab80124e121ea6b8852441189c |
Hashes for bgen-1.3.1-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc743fb2443915b529c5a88ed3c3e4e00526eb026577e9334e1d1c8018182c7f |
|
MD5 | 22997d629c1b6cb91be71b8288296a69 |
|
BLAKE2b-256 | 34ce68fa4361e340b30da332f53b2dd24d7c16e6065ea7a5fa29b848b3b42e6c |
Hashes for bgen-1.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80e1a25f32b1635a1472d6965777b7f1d8008048eee2548250f4560a59da5c1c |
|
MD5 | 04e16c666c413f9592aec40fe58307da |
|
BLAKE2b-256 | 87770431d0d0892a2abba162dd272a64131f2d822eac284cbf6c4ee4c68d301a |
Hashes for bgen-1.3.1-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ecb0e9133f472c91e84d948d40b3f653b0d15dfc788141f8749ae7d2954fbd5 |
|
MD5 | 4201017655d91f96e64e1dbb16ee2dac |
|
BLAKE2b-256 | 2ada235aa3e5713272272a5639cd1cf0ef8abc6ff45d020ac452e102ee66b5df |
Hashes for bgen-1.3.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92665ff5de445305a833e3fa6df3ab3dde513ee04b714e7e9a1d1b54be97ce65 |
|
MD5 | 1520e8c536c9e6a65408db067739aecd |
|
BLAKE2b-256 | 31e11a38a52c37f6b8335b8002d983f07b703d067b5693945af992d05c3cdce7 |
Hashes for bgen-1.3.1-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f618220991eb3c697858265ff4fb03319fe717c31b3dced2e3da3f21fe606dc6 |
|
MD5 | 52d5cfbb5771717c6881fcb8eb5203de |
|
BLAKE2b-256 | cb07a36f3e40ec6f9249c6154f111a07a030f97c842a62631de7927029f8dfb4 |
Hashes for bgen-1.3.1-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65b4fdbcda817100101ba0e2f0b41d407d97c7a06abe021fcb3ed59d47bd251a |
|
MD5 | e230914605709dcf35acba474c4683b4 |
|
BLAKE2b-256 | 58774d577804c3c93950c23fd773af86612f6b5c9673e37261e5cfbc5ca7c467 |
Hashes for bgen-1.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce3fcb41c2bb1a79b4887a98d24f2748fe5b54cd04f4dc53d84cecc767a6b6ef |
|
MD5 | eb2b88aeec5d749a9b3a2f2a1c3dd395 |
|
BLAKE2b-256 | 43d92d8eb49363d843cf37d1befd4ac3d202416624668b4291c93f992149c4f1 |
Hashes for bgen-1.3.1-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c56b67aab1f46d36771c844c8303f534b99e2d3e35c1846bce9a4a8e85277471 |
|
MD5 | 6c21d725f0ca93f5b50e1750c33cdb36 |
|
BLAKE2b-256 | f64997cbbf26466ed000767b4143a248f2c11880f50979a02c1f3f8a7cda2e60 |
Hashes for bgen-1.3.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f023b853a4770367f8e302b1292f58b814ad2ebb1a60ef2a91deef624e49d950 |
|
MD5 | b018db9f2722434bdce370ede54250a5 |
|
BLAKE2b-256 | d3664f60af739bbbabe6c4d1d86951b24ce58e6de4112a49c3c71bc0583621d7 |
Hashes for bgen-1.3.1-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a1ef5b2df74581b7ad4cbf0fd2dfcd39c531dfb4b81ad46312eebac1a97f15a |
|
MD5 | 3b0fa543b6ca7d2c72c15ba04c0c323d |
|
BLAKE2b-256 | ad8ddc116ea37b5f26dc2b467180bd3e855ed5b1463bdbba1ab8e21b792a1a2c |
Hashes for bgen-1.3.1-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffbd125510c6a8ca917ec5746a419d99a09f7b271262ffba6801bc172334a389 |
|
MD5 | c498d66a83806082b93f19cc380295ce |
|
BLAKE2b-256 | 550a0b84441e28ce88cb938f14f84e4d875d899004448a549a9150707e356916 |
Hashes for bgen-1.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d45b2e6b70ce300103b1fb6bd83506808d7fb098949ecd5477161d694c83aeab |
|
MD5 | fb2a28becd074d49bab9d48c7d33a1c5 |
|
BLAKE2b-256 | de841166a4eb10c99917cafd0e536c215cda7cae46a3998add93a47c531715f6 |
Hashes for bgen-1.3.1-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36a5df3780dc0eea1aacd8bf26522d3760306ed6f3924e0e7b1d990204881b2a |
|
MD5 | ce9696b54ff7ba1926497599c7886b42 |
|
BLAKE2b-256 | db800ff979c6ff63ccc14a629d16da8d251c35027655e3405b16c1275f30e186 |
Hashes for bgen-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 402466537184638a9eed0540592868d576deb611c400075af60b09daee085ac2 |
|
MD5 | ac4bbd609798652b15018876cdc72938 |
|
BLAKE2b-256 | 66fb37158d2497c74559735610b504e4904b5fda8bba27b0df99d1c7db88bd7f |
Hashes for bgen-1.3.1-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a754ad40a562f248ec45f9dffe5efcb32ba426ddfa9668f8171b1c4b9e96bb4 |
|
MD5 | 069ebbbfc1a71352fa1c5ee82bd9b61c |
|
BLAKE2b-256 | 6ae1ddfbe060f53ffed9f917477568ee6d4182670594519e99994b97f99f8a35 |
Hashes for bgen-1.3.1-cp38-cp38-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e20664cd5d97a6941c51a70e6fbeea01064ae2a8d7fb2ac89c0b67ace89b1f9a |
|
MD5 | 07611a07aed15562b51277b1b1e38740 |
|
BLAKE2b-256 | e4981d1d3069f2c35e322622f496073dfd6e8303f53b7da1382cc5025a877155 |
Hashes for bgen-1.3.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce3339319d388c21f31e2f51870f2d7561ed1c099e7f86177e8577a535949237 |
|
MD5 | fb0de41cc3323809969f5804653a72ae |
|
BLAKE2b-256 | 5ff2246827bc7899c3d19e8000388fdd388bca82e3f4fb35fec6f340d03896dc |
Hashes for bgen-1.3.1-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb0c2df168ea659d095194f26e5b6d3313f366d66e9fb4b210571202079dee5d |
|
MD5 | b445bb572582147d5504d2eea9a3ce05 |
|
BLAKE2b-256 | 4ac2e612543d231ee72d6ac4607778464885ffd10378be021c113bac43acd6e8 |
Hashes for bgen-1.3.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0af6b07b882a2dd33190add212a512a19a2e2ae2515adbf0de48a9c50993d30f |
|
MD5 | 7cc445234b3e6aab30d4a151936dbb74 |
|
BLAKE2b-256 | 1e8047da822de7aa33ff8a57f56faef4704fd8a512d1cfa617f23b328c77e6a4 |
Hashes for bgen-1.3.1-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6669084e06db5459994e8b990fda63d53a8d67af8770f6f4e7d86944e7b0f1d5 |
|
MD5 | 43e1286129214c67a77d817b0721c942 |
|
BLAKE2b-256 | 904e6a877ad8dd12335dc638d4f7ff85a7dabf29c35988ebb5f5e0009a3d4b6a |
Hashes for bgen-1.3.1-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f591aab730d3fefd64f4a31fe897b8c46864ff5196241ef925fe591f2dd20ab |
|
MD5 | 07bac470b46388d03520d20c268e0b7d |
|
BLAKE2b-256 | eacbde22d84246382fb614736b3a6654627db9c35c5d5563aa3245e35d1189ef |
Hashes for bgen-1.3.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f2c5591e90370374d269e39c47135944a2c4b20254cc1de3ce5c25f0ba7e46b |
|
MD5 | 8164f539a7b54702df6c0e26228e1bc9 |
|
BLAKE2b-256 | 558b5089dbb1ebb0c264040af145c486a82d09502f884c3d40abfe6093c4b4c9 |
Hashes for bgen-1.3.1-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b205018e9ec8b2e480f124b34594a7b9b9eab9c026b88352ab89d4b5485678b |
|
MD5 | 47ff267519f094bc62a0b1166276b53a |
|
BLAKE2b-256 | 67266b6ff6aa4fb1a139914fbd8bc35d22c2a0b525b4fccfbe3d4cf466326056 |
Hashes for bgen-1.3.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7e4cc487ec919c12163232697322419202b7d4bf2c410d41659baf764164f1a |
|
MD5 | dc10583d8d99fc34b8f26003827734c0 |
|
BLAKE2b-256 | b54c1c8aabb027902761f97e32d7aa99124f9b918660938df2ef20de7288604a |