Skip to main content

Pandas ExtensionDtypes and ExtensionArray for working with genomics data

Project description

pandas_genomics logo


Pandas ExtensionDtypes and ExtensionArray for working with genomics data

Quickstart

Variant objects holds information about a particular variant:

from pandas_genomics.scalars import Variant
variant = Variant('12', 112161652, id='rs12462', ref='A', alt=['C', 'T'])
print(variant)
rs12462[chr=12;pos=112161652;ref=A;alt=C,T]

Each variant should have a unique ID, and a random ID is generated if one is not specified.

Genotype objects are associated with a particular Variant:

gt = variant.make_genotype("A", "C")
print(gt)
A/C

The GenotypeArray stores genotypes with an associated variant and has useful methods and properties:

from pandas_genomics.scalars import Variant
from pandas_genomics.arrays import GenotypeArray
variant = Variant('12', 112161652, id='rs12462', ref='A', alt=['C'])
gt_array = GenotypeArray([variant.make_genotype_from_str(s) for s in ["C/C", "A/C", "A/A"]])
print(gt_array)
<GenotypeArray>
[Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=1, allele2=1),
Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=0, allele2=1),
Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=0, allele2=0)]
Length: 3, dtype: genotype[12; 112161652; rs12462; A; C]
print(gt_array.astype(str))
    ['C/C' 'A/C' 'A/A']
print(gt_array.encode_dominant())
    <IntegerArray>
    [1.0, 1.0, 0.0]
    Length: 3, dtype: float

There are also genomics accessors for Series and DataFrame

import pandas as pd
print(pd.Series(gt_array).genomics.encode_codominant())
    0    Hom
    1    Het
    2    Ref
    Name: rs12462_C, dtype: category
    Categories (3, object): ['Ref' < 'Het' < 'Hom']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_genomics-0.12.1.tar.gz (34.2 kB view hashes)

Uploaded Source

Built Distribution

pandas_genomics-0.12.1-py3-none-any.whl (41.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page