Skip to main content

Boolean matrix factorization on RNA expression data

Project description

EM_BMF

Robust Boolean matrix factorization via EM_BMF The code is completely process-oriented. Sorry for contaminating your name space.

Dependency: (I think it will work as long as Annaconda on Python3 is installed)

numpy -- 1.11.3

scipy -- 1.1.0

numba -- 0.40.0

Example usage:

import numpy as np
from boolem import boolem

def synthesis(shape, latent_size, P, noise_p=0.0):
    '''
    In this synthesis, the probability of X was sampled from the joint probability of the latent factors.
    P is the parameter as Beta(1/(1-p),2) for generating the probability in latent factors.
    '''

    a = np.zeros((shape[0], latent_size))
    b = np.zeros((latent_size, shape[1]))
    X = np.zeros(shape)
    for l in range(latent_size):
        a[:,l] = np.random.binomial(1, P[l], shape[0])
        b[l,:] = np.random.binomial(1, P[l], shape[1])
        X += np.outer(a[:,l],b[l,:]) 
    X[X>1] = 1
    flip = np.random.binomial(1, noise_p, X.shape)
    X_noisy = np.abs(X-flip)
    return X_noisy, X, a, b

# Generate a Boolean matrice with heterogeneous Boolean factors and uniform noise.   
X_noisy, X, a, b = synthesis((1000, 1000), 4, np.random.uniform(0.2,0.5,4), noise_p=0.2)

# Feed the model with noisy matrix. 
# Latent_size: the dimension of latent Boolean factors. 
# alpha: the alpha for the beta prior. Default is recommended.
# beta: the beta for the beta prior. Default is recommended.
# mask: the matrix with the same shape as X. 0 means the correponding element in X is missing.
# max_iter: the maximum iteration for gradient-based optimization
model = boolem(np.int8(X_noisy), latent_size=5, alpha=0.95, beta=0.95, mask=np.ones(X.shape, dtype=np.int8), max_iter=200)
model.run()

# After running factorization, the model will contain several new attributes as the output:
# model.U: the latent factor with the shape (X.shape[0], latent_size)
# model.Z: the latent facotr with the shape (latent_size, X.shape[1])
# model.X_hat: reconstructed Boolean matrix from U and Z. Note that values in X_hat is continuous within [0,1]
print('Reconstruction error:', np.abs((model.X_hat>0.5)-X).mean())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boolem-0.0.5.tar.gz (4.0 kB view hashes)

Uploaded Source

Built Distribution

boolem-0.0.5-py3-none-any.whl (6.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page