Skip to main content

Remove identifiers from data using BERT

Project description

bert-deid

Code to fine-tune BERT on a medical note de-identification task.

Install

  • (Recommended) Create an environment called deid
    • conda env create -f environment.yml
  • pip install locally
    • pip install bert-deid

Download

To download the model, we have provided a helper script in bert-deid:

# note: MODEL_DIR environment variable used by download
# by default, we download to bert_deid_model in the current directory
export MODEL_DIR="bert_deid_model"
bert_deid download

Usage

The model can be imported and used directly within Python.

from bert_deid.model import Transformer

# load in a trained model
model_path = 'bert_deid_model'
deid_model = Transformer(model_path)

with open('tests/example_note.txt', 'r') as fp:
    text = ''.join(fp.readlines())

print(deid_model.apply(text, repl='___'))

# we can also get the original predictions
preds = deid_model.predict(text)

# print out the identified entities
for p, pred in enumerate(preds):
    prob = pred[0]
    label = pred[1]
    start, stop = pred[2:]

    # print the prediction labels out
    print(f'{text[start:stop]:15s} {label} ({prob:0.3f})')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bert_deid-0.2.3.tar.gz (33.2 kB view hashes)

Uploaded Source

Built Distribution

bert_deid-0.2.3-py3-none-any.whl (40.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page