Skip to main content

Functions for detecting anomalies in tabular datasets using Mixed Graphical Models.

Project description

adadmire

Functions for detecting anomalies in molecular data sets using Mixed Graphical Models.

Installation

Enter the following commands in a shell like bash, zsh or powershell:

pip install -U adadmire

Usage

The usage example in this section require you to download the data files from folder data first. For a description of the contents of this folder, see section Data.

Example 1

from adadmire import loo_cv_cor, get_threshold_continuous, get_threshold_discrete
import numpy as np

# download data/Feist_et_al
# and load data
X = np.load('data/Feist_et_al/scaled_data_raw.npy') # continuous data
D = np.load('data/Feist_et_al/pheno.npy') # discrete data
levels = np.load('data/Feist_et_al/levels.npy') # levels of discrete variables
# define lambda sequence
lam_zero = np.sqrt(np.log(X.shape[1] + D.shape[1]/2)/X.shape[0])
lam_seq = np.array([-1.75,-2.0,-2.25])
lam = [pow(2, x) for x in lam_seq]
lam = np.array(lam)
lam = lam_zero * lam
# perform cross validation
prob_hat, B_m, lam_opt,  x_hat, d_hat = loo_cv_cor(X,D,levels,lam)
# determine continuous threshold
X_cor, threshold_cont, n_ano_cont,  position_cont = get_threshold_continuous(X, x_hat, B_m)
# returns: X corrected for detected anomalies, threshold, number of detected anomalies (n_ano_cont) and position
print(n_ano_cont) # 46 detected continuous anomalies
n_ano_disc, threshold_cont, position_disc = get_threshold_discrete(D, levels, d_hat)
# returns:  number of detected anomalies (n_ano_disc), threshold and position
print(n_ano_disc)
# 0 detected discrete anomalies

Example 2

from adadmire import loo_cv_cor, get_threshold_continuous, get_threshold_discrete, place_anomalies_continuous
import numpy as np

# download data/Higuera_et_al
# and load data
X = np.load('data/Higuera_et_al/scaled_data_raw.npy') # continuous data
D = np.load('data/Higuera_et_al/pheno.npy') # discrete data
levels = np.load('data/Higuera_et_al/levels.npy') # levels of discrete variables

# use originial data set and create simulation by introducing artificial anomalies
X_ano = place_anomalies_continuous( X, n_ano = 1360, epsilon = 1.2)
# n_ano: how many anomalies should be introduced?
# epsilon defines "strength" of introduced anomalies

# now detect anomalies using ADMIRE
lam_zero = np.sqrt(np.log(X.shape[1] + D.shape[1]/2)/X.shape[0])
lam_seq = np.array([-1.75,-2.0,-2.25])
lam = [pow(2, x) for x in lam_seq]
lam = np.array(lam)
lam = lam_zero * lam
prob_hat, B_m, lam_opt,  x_hat, d_hat = loo_cv_cor(X_ano,D,levels,lam)
X_cor, threshold_cont, n_ano_cont,  position_cont = get_threshold_continuous(X_ano, x_hat, B_m)

Data

In the directory data you can find two sub directories:

  • Feist_et_al: contains data set as discribed in Feist et al, 2018 and Buck et al, 2023.
    • data_raw.xlsx: raw, unscaled data, contains measurements of 100 samples and 49 metabolites
    • scaled_data_raw.npy: numpy file containing scaled version of data_raw.xlsx
    • pheno_with_simulations.xlsx: pheno data corresponding to data_raw.xlsx, also contains cell stimulations
    • pheno.npy: numpy file corresponding to pheno_with_simulations.xlsx (only contains variables batch and myc)
    • levels.npy: numpy file containing the levels of the discrete variables in pheno.npy
  • Higuera_et_al: contains down sampled data set from Higuera et al, 2015 as described in Buck et al, 2023.
    • data_raw.xlsx: raw, unscaled data, contains measurements of 400 samples and 68 proteins (down sampled from Higuera et al, 2015)
    • scaled_data_raw.npy: numpy file containing scaled version of data_raw.xlsx
    • pheno_.xlsx: pheno data corresponding to data_raw.xlsx
    • pheno.npy: numpy file corresponding to pheno.xlsx
    • levels.npy: numpy file containing the levels of the discrete variables in pheno.npy

Contribute

In case you have questions, feature requests or find any bugs in adadmire, please create a corresponding issue at gitlab.spang-lab.de/bul38390/admire/issues.

In case you want to write code for this package, see Contribute for details.

References

Feist et al, 2018

Feist, Maren, et al. "Cooperative stat/nf-kb signaling regulates lymphoma metabolic reprogramming and aberrant got2 expression." Nature Communications, 2018

Higuera et al, 2015

Higuera, Clara et al, "Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome." PLOS ONE, 2015

Buck et al, 2023

Buck, Lena et al. "Anomaly detection in mixed high dimensional molecular data"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adadmire-1.0.0.tar.gz (656.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adadmire-1.0.0-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file adadmire-1.0.0.tar.gz.

File metadata

  • Download URL: adadmire-1.0.0.tar.gz
  • Upload date:
  • Size: 656.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.2

File hashes

Hashes for adadmire-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a5ad3afbaa434ce5436423abd6005e028508f3b7ba09d6f7b910c607cf92e6f7
MD5 bc294be017a6dd285f90693f00ff5ef8
BLAKE2b-256 ef21e30bb57557c674b0f458b6148f9968cacb74a1201c80c0c6fc74abbec860

See more details on using hashes here.

File details

Details for the file adadmire-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: adadmire-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.2

File hashes

Hashes for adadmire-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a165299767b0243921746ac552dc835c32eed0c442210313cc8c088fe8bb646e
MD5 9e3bdd9f1ab8e880eda9ff13d09a1ef3
BLAKE2b-256 ab0dc581270edf648c6dede445e7bfd04eb6c8b4a300aeaf178199d95bd5fe92

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page