Package was renamed from Biocarta v0.2.27 to Biocartograph because of an unintentional name clash
Project description
Biocartograph
Creating Cartographic Representations of Biological Data
Installation
pip install biocartograph
Example code
if __name__ == '__main__' :
from biocartograph.quantification import full_mapping
#
adf = pd.read_csv('analytes.tsv',sep='\t',index_col=0)
#
# WE DO NOT WANT TO KEEP POTENTIALLY BAD ENTRIES
adf = adf.iloc[ np.inf != np.abs( 1.0/np.std(adf.values,1) ) ,
np.inf != np.abs( 1.0/np.std(adf.values,0) ) ].copy()
#
# READING IN SAMPLE INFORMATION
# THIS IS NEEDED FOR THE ALIGNED PCA TO WORK
jdf = pd.read_csv('journal.tsv',sep='\t',index_col=0)
jdf = jdf.loc[:,adf.columns.values]
#
alignment_label , sample_label = 'Disease' , None
add_labels = ['Cell-line']
#
cmd = 'max'
# WRITE FILES AND MAKE NOISE
bVerbose = True
# CREATE AN OPTIMIZED REPRESENTATION
bExtreme = True
# WE MIGHT WANT SOME SPECIFIC INTERSECTIONS OF THE HIERARCHY
n_clusters = [20,40,60,80,100]
# USE ALL INFORMATION
n_components = None
umap_dimension = 2
n_neighbors = 20
local_connectivity = 20.
transform_seed = 42
#
print ( adf , jdf )
#
# distance_type = 'correlation,spearman,absolute' # DONT USE THIS
distance_type = 'covariation' # BECOMES CO-EXPRESSION BASED
#
results = full_mapping ( adf , jdf ,
bVerbose = bVerbose ,
bExtreme = bExtreme ,
n_clusters = n_clusters ,
n_components = n_components ,
distance_type = distance_type ,
umap_dimension = umap_dimension ,
umap_n_neighbors = n_neighbors ,
umap_local_connectivity = local_connectivity ,
umap_seed = transform_seed ,
hierarchy_cmd = cmd ,
add_labels = add_labels ,
alignment_label = alignment_label ,
sample_label = None )
#
map_analytes = results[0]
map_samples = results[1]
hierarchy_analytes = results[2]
hierarchy_samples = results[3]
or just call it using the default values:
import pandas as pd
import numpy as np
if __name__ == '__main__' :
from biocartograph.quantification import full_mapping
#
adf = pd.read_csv('analytes.tsv',sep='\t',index_col=0)
#
adf = adf.iloc[ np.inf != np.abs( 1.0/np.std(adf.values,1) ) ,
np.inf != np.abs( 1.0/np.std(adf.values,0) ) ].copy()
jdf = pd.read_csv('journal.tsv',sep='\t',index_col=0)
jdf = jdf.loc[:,adf.columns.values]
#
alignment_label , sample_label = 'Disease' , None
add_labels = ['Cell-line']
#
results = full_mapping ( adf , jdf ,
bVerbose = True ,
n_clusters = [40,80,120] ,
add_labels = add_labels ,
alignment_label = alignment_label )
#
map_analytes = results[0]
map_samples = results[1]
hierarchy_analytes = results[2]
hierarchy_samples = results[3]
and plotting the information of the map analytes yields : Cancer Disease Example
You can also run an alternative algorithm where the UMAP coordinates are employed directly for clustering by setting
results = full_mapping ( adf , jdf ,
bVerbose = True ,
bUseUmap = True ,
n_clusters = [40,80,120] ,
add_labels = add_labels ,
alignment_label = alignment_label )
with the following results.
Download the zip and open the html index:
chromium index.html
Other generated solutions
The clustering visualisations were created using the Biocartograph and hvplot :
What groupings corresponds to biomarker variance that describe them? Here are two visualisations of that:
Diseases : cancers biocartograph gfa Reactome enrichments biocartograph gfa cluster enrichments biocartograph treemap cluster 61
Tissues : tissues
Single Cells: single cells biocartograph gfa enrichment biocartograph treemap cluster 47
Blood Cells: blood cells biocartograph gfa enrichment biocartograph treemap cluster 2
TCGA-BRCA : Calculated using the biocartograph and a TCGA derived data set with the results for Breast Cancer mRNA-seq
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for biocartograph-0.4.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a24b440957036b3a47cf54b7604b3c277472493b18df748bcaa34d87b004b670 |
|
MD5 | ee81b94e0f75919c99b9cb4e274d4346 |
|
BLAKE2b-256 | 122d0a8ebc0d53ce293bfe56ce00ea7e646b9ae0035172a87a58db58dcff0364 |