pyhealth

A Python library for healthcare AI

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PyHealth is designed for both ML researchers and medical practitioners. We can make your healthcare AI applications easier to deploy and more flexible and customizable. [Tutorials]

[News!] We are running the “PyHealth Live” gathering at 8 PM CST every Wednesday night! Welcome to join over zoom. Check out PyHealth Live for more information and watch the Live Videos.

1. Introduction

PyHealth can support diverse electronic health records (EHRs) such as MIMIC and eICU and all OMOP-CDM based databases and provide various advanced deep learning algorithms for handling important healthcare tasks such as diagnosis-based drug recommendation, patient hospitalization and mortality prediction, and ICU length stay forecasting, etc.

Build a healthcare AI pipeline can be as short as 10 lines of code in PyHealth.

2. Installation

You could install from PyPi:

pip install pyhealth

or from github source:

git clone https://github.com/sunlabuiuc/PyHealth.git
cd pyhealth
pip install .

Required Dependencies

python>=3.8
torch>=1.8.0
rdkit>=2022.03.4
scikit-learn>=0.24.2
networkx>=2.6.3
pandas>=1.3.2
tqdm

3. Modules

All healthcare tasks in our package follow a five-stage pipeline:

load dataset -> define task function -> build ML/DL model -> model training -> inference

! We try hard to make sure each stage is as separate as possibe, so that people can customize their own pipeline by only using our data processing steps or the ML models. Each step will call one module and we introduce them using an example.

3.1 An ML Pipeline Example

STEP 1: <pyhealth.datasets> provides a clean structure for the dataset, independent from the tasks. We support MIMIC-III, MIMIC-IV and eICU, as well as the standard OMOP-formatted data. The dataset is stored in a unified Patient-Visit-Event structure.

from pyhealth.datasets import MIMIC3Dataset
mimic3base = MIMIC3Dataset(
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
    # map all NDC codes to ATC 3-rd level codes in these tables
    code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
)

User could also store their own dataset into our <pyhealth.datasets.SampleDataset> structure and then follow the same pipeline below, see Tutorial

STEP 2: <pyhealth.tasks> inputs the <pyhealth.datasets> object and defines how to process each patient’s data into a set of samples for the tasks. In the package, we provide several task examples, such as drug recommendation and length of stay prediction.

from pyhealth.tasks import drug_recommendation_mimic3_fn
from pyhealth.datasets import split_by_patient, get_dataloader

mimic3sample = mimic3base.set_task(task_fn=drug_recommendation_mimic3_fn) # use default task
train_ds, val_ds, test_ds = split_by_patient(mimic3sample, [0.8, 0.1, 0.1])

# create dataloaders (torch.data.DataLoader)
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=32, shuffle=False)
test_loader = get_dataloader(test_ds, batch_size=32, shuffle=False)

STEP 3: <pyhealth.models> provides the healthcare ML models using <pyhealth.models>. This module also provides model layers, such as pyhealth.models.RETAINLayer for building customized ML architectures. Our model layers can used as easily as torch.nn.Linear.

from pyhealth.models import Transformer

model = Transformer(
    dataset=mimic3sample,
    feature_keys=["conditions", "procedures"],
    label_key="drugs",
    mode="multilabel",
)

STEP 4: <pyhealth.trainer> is the training manager with train_loader, the val_loader, val_metric, and specify other arguemnts, such as epochs, optimizer, learning rate, etc. The trainer will automatically save the best model and output the path in the end.

from pyhealth.trainer import Trainer

trainer = Trainer(model=model)
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=50,
    monitor="pr_auc_samples",
)

STEP 5: <pyhealth.metrics> provides several common evaluation metrics (refer to Doc and see what are available) and special metrics in healthcare, such as drug-drug interaction (DDI) rate.

trainer.evaluate(test_loader)

3.2 Medical Code Map

<pyhealth.codemap> provides two core functionalities: (i) looking up information for a given medical code (e.g., name, category, sub-concept); (ii) mapping codes across coding systems (e.g., ICD9CM to CCSCM). This module can be independently applied to your research.
For code mapping between two coding systems

from pyhealth.medcode import CrossMap

codemap = CrossMap.load("ICD9CM", "CCSCM")
codemap.map("82101") # use it like a dict

codemap = CrossMap.load("NDC", "ATC")
codemap.map("00527051210")

For code ontology lookup within one system

from pyhealth.medcode import InnerMap

icd9cm = InnerMap.load("ICD9CM")
icd9cm.lookup("428.0") # get detailed info
icd9cm.get_ancestors("428.0") # get parents

3.3 Medical Code Tokenizer

<pyhealth.tokenizer> is used for transformations between string-based tokens and integer-based indices, based on the overall token space. We provide flexible functions to tokenize 1D, 2D and 3D lists. This module can be independently applied to your research.

from pyhealth.tokenizer import Tokenizer

# Example: we use a list of ATC3 code as the token
token_space = ['A01A', 'A02A', 'A02B', 'A02X', 'A03A', 'A03B', 'A03C', 'A03D', \
        'A03F', 'A04A', 'A05A', 'A05B', 'A05C', 'A06A', 'A07A', 'A07B', 'A07C', \
        'A12B', 'A12C', 'A13A', 'A14A', 'A14B', 'A16A']
tokenizer = Tokenizer(tokens=token_space, special_tokens=["<pad>", "<unk>"])

# 2d encode
tokens = [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', 'B035', 'C129']]
indices = tokenizer.batch_encode_2d(tokens) # [[8, 9, 10, 11], [12, 1, 1, 0]]

# 2d decode
indices = [[8, 9, 10, 11], [12, 1, 1, 0]]
tokens = tokenizer.batch_decode_2d(indices) # [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', '<unk>', '<unk>']]

4. Tutorials

We provide the following tutorials to help users get started with our pyhealth.

Tutorial 0: Introduction to pyhealth.data

Tutorial 1: Introduction to pyhealth.datasets

Tutorial 2: Introduction to pyhealth.tasks

Tutorial 3: Introduction to pyhealth.models

Tutorial 4: Introduction to pyhealth.trainer

Tutorial 5: Introduction to pyhealth.metrics

Tutorial 6: Introduction to pyhealth.tokenizer

Tutorial 7: Introduction to pyhealth.medcode

The following tutorials will help users build their own task pipelines.

Pipeline 1: Drug Recommendation

Pipeline 2: Length of Stay Prediction

Pipeline 3: Readmission Prediction

Pipeline 4: Mortality Prediction

The following tutorials will help users to explore advanced features of pyhealth.

Advanced Tutorial 1: Fit your dataset into our pipeline

Advanced Tutorial 2: Define your own healthcare task

Advanced Tutorial 3: Adopt customized model into pyhealth

Advanced Tutorial 4: Load your own processed data into pyhealth and try out our ML models

5. Datasets

We provide the processing files for the following open EHR datasets:

Dataset	Module	Year	Information
MIMIC-III	pyhealth.datasets.MIMIC3BaseDataset	2016	MIMIC-III Clinical Database
MIMIC-IV	pyhealth.datasets.MIMIC4BaseDataset	2020	MIMIC-IV Clinical Database
eICU	pyhealth.datasets.eICUBaseDataset	2018	eICU Collaborative Research Database
OMOP	pyhealth.datasets.OMOPBaseDataset		OMOP-CDM schema based dataset

6. Machine/Deep Learning Models and Benchmarks

Model Name	Type	Module	Year	Reference
Convolutional Neural Network (CNN)	deep learning	pyhealth.models.CNN	1989	Handwritten Digit Recognition with a Back-Propagation Network
Recurrent Neural Nets (RNN)	deep Learning	pyhealth.models.RNN	2011	Recurrent neural network based language model
Transformer	deep Learning	pyhealth.models.Transformer	2017	Atention is All you Need
RETAIN	deep Learning	pyhealth.models.RETAIN	2016	RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism
GAMENet	deep Learning	pyhealth.models.GAMENet	2019	GAMENet: Graph Attention Mechanism for Explainable Electronic Health Record Prediction
MICRON	deep Learning	pyhealth.models.MICRON	2021	Change Matters: Medication Change Prediction with Recurrent Residual Networks
SafeDrug	deep Learning	pyhealth.models.SafeDrug	2021	SafeDrug: Dual Molecular Graph Encoders for Recommending Effective and Safe Drug Combinations

Check the interactive map on benchmark EHR predictive tasks.

7. Citing PyHealth

@software{pyhealth2022github,
    author = {Chaoqi Yang and Zhenbang Wu and Patrick Jiang and Jimeng Sun},
    title = {{PyHealth}: A Deep Learning Toolkit for Healthcare Predictive Modeling},
    url = {https://github.com/sunlabuiuc/PyHealth},
    year = {2022},
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.6

Feb 24, 2024

1.1.5

Feb 24, 2024

1.1.5a0 pre-release

Feb 24, 2024

1.1.4

May 31, 2023

This version

1.1.3

Jan 24, 2023

1.1.2

Dec 14, 2022

1.1.1

Nov 16, 2022

1.1

Nov 16, 2022

1.0a2 pre-release

Oct 23, 2022

1.0a1 pre-release

Oct 23, 2022

1.0a0 pre-release

Oct 23, 2022

0.0.6

Jan 11, 2021

0.0.5

Nov 9, 2020

0.0.4

Aug 26, 2020

0.0.3

Aug 13, 2020

0.0.2

Aug 6, 2020

0.0.1

Aug 3, 2020

0.0.0

Aug 3, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhealth-1.1.3.tar.gz (82.3 kB view hashes)

Uploaded Jan 24, 2023 Source

Built Distribution

pyhealth-1.1.3-py2.py3-none-any.whl (113.8 kB view hashes)

Uploaded Jan 24, 2023 Python 2 Python 3

Hashes for pyhealth-1.1.3.tar.gz

Hashes for pyhealth-1.1.3.tar.gz
Algorithm	Hash digest
SHA256	`c5239f0aef91a357984eca3cb3adbfeec7c331611b6527d99cf10e306894d10c`
MD5	`108449a2c2864723935488dd2528dc62`
BLAKE2b-256	`0bf9b81343b783f0a92840a19691ca92e3105cd7434282062c8a4bf8d25a9a7c`

Hashes for pyhealth-1.1.3-py2.py3-none-any.whl

Hashes for pyhealth-1.1.3-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`4c47b29a1cc08ce89669451d8c4609180a7871b71bb3435a9ba25451e5c2cde1`
MD5	`e0380be9f842cb7595b32a20dbfd5750`
BLAKE2b-256	`e69b569a89061af35f21d23e7d21e025de881fa2c65963a33fc71afcd31e1a80`