Deep learning similarity measure for comparing MS/MS spectra.

Project description

GitHub GitHub Workflow Status

ms2deepscore

ms2deepscore provides a Siamese neural network that is trained to predict molecular structural similarities (Tanimoto scores) from pairs of mass spectrometry spectra.

The library provides an intuitive classes to prepare data, train a siamese model, and compute similarities between pairs of spectra.

In addition to the prediction of a structural similarity, MS2DeepScore can also make use of Monte-Carlo dropout to assess the model uncertainty.

Reference

If you use MS2DeepScore for your research, please cite the following:

"MS2DeepScore - a novel deep learning similarity measure for mass fragmentation spectrum comparisons" Florian Huber, Sven van der Burg, Justin J.J. van der Hooft, Lars Ridder, bioRxiv 2021, doi: https://doi.org/10.1101/2021.04.18.440324

Setup

Requirements

Python 3.7 or higher

Installation

Simply install using pip: pip install ms2deepscore

Prepare environment

We recommend to create an Anaconda environment with

conda create --name ms2deepscore python=3.8
conda activate ms2deepscore
pip install ms2deepscore

Alternatively, simply install in the environment of your choice by .

Or, to also include the full matchms functionality:

conda create --name ms2deepscore python=3.8
conda activate ms2deepscore
conda install --channel bioconda --channel conda-forge matchms
pip install ms2deepscore

Quick start: How to prepare data, train a model, and compute similarities.

See notebooks/MS2DeepScore_tutorial.ipynb for a more extensive fully-working example on test data.

There are two different ways to use MS2DeepScore to compute spectral similarities. You can train a new model on a dataset of your choice. That, however, should preferentially contain a substantial amount of spectra to learn relevant features, say > 10,000 spectra of sufficiently diverse types. The second way is much simpler: Use a model that was pretrained on a large dataset.

1) Use a pretrained model to compute spectral similarities

We provide a model which was trained on > 100,000 MS/MS spectra from GNPS, which can simply be downloaded from zenodo here. To then compute the similarities between spectra of your choice you can run something like:

from matchms import calculate_scores()
from matchms.importing import load_from_msp
from ms2deepscore import MS2DeepScore
from ms2deepscore.models import load_model

# Import data
references = load_from_msp("my_reference_spectra.msp")
queries = load_from_msp("my_query_spectra.msp")

# Load pretrained model
model = load_model("MS2DeepScore_allGNPSpositive_10k_500_500_200.hdf5")

similarity_measure = MS2DeepScore(model)
# Calculate scores and get matchms.Scores object
scores = calculate_scores(references, queries, similarity_measure)

If you want to calculate all-vs-all spectral similarities, e.g. to build a network, than you can run:

scores = calculate_scores(references, references, similarity_measure, is_symmetric=True)

To use Monte-Carlo Dropout to also get a uncertainty measure with each score, run the following:

from matchms import calculate_scores()
from matchms.importing import load_from_msp
from ms2deepscore import MS2DeepScoreMonteCarlo
from ms2deepscore.models import load_model

# Import data
references = load_from_msp("my_reference_spectra.msp")
queries = load_from_msp("my_query_spectra.msp")

# Load pretrained model
model = load_model("MS2DeepScore_allGNPSpositive_10k_500_500_200.hdf5")

similarity_measure = MS2DeepScoreMonteCarlo(model, n_ensembles=10)
# Calculate scores and get matchms.Scores object
scores = calculate_scores(references, queries, similarity_measure)

In that scenario, scores["score"] contains the similarity scores (median of the ensemble of 10x10 scores) and scores["uncertainty"] give an uncertainty estimate (interquartile range of ensemble of 10x10 scores.

2) Train an own MS2DeepScore model

Data preperation

Bin spectrums using ms2deepscore.SpectrumBinner. In this binned form we can feed spectra to the model.

from ms2deepscore import SpectrumBinner
spectrum_binner = SpectrumBinner(1000, mz_min=10.0, mz_max=1000.0, peak_scaling=0.5)
binned_spectrums = spectrum_binner.fit_transform(spectrums)

Create a data generator that will generate batches of training examples. Each training example consists of a pair of binned spectra and the corresponding reference similarity score.

from ms2deepscore.data_generators import DataGeneratorAllSpectrums
dimension = len(spectrum_binner.known_bins)
data_generator = DataGeneratorAllSpectrums(binned_spectrums, tanimoto_scores_df,
                                           dim=dimension)

Train a model

Initialize and train a SiameseModel. It consists of a dense 'base' network that produces an embedding for each of the 2 inputs. The 'head' model computes the cosine similarity between the embeddings.

from tensorflow import keras
from ms2deepscore.models import SiameseModel
model = SiameseModel(spectrum_binner, base_dims=(200, 200, 200), embedding_dim=200,
                     dropout_rate=0.2)
model.compile(loss='mse', optimizer=keras.optimizers.Adam(lr=0.001))
model.fit(data_generator,
          validation_data=data_generator,
          epochs=2)

Predict similarity scores

Calculate similariteis for a pair of spectra

from ms2deepscore import MS2DeepScore
similarity_measure = MS2DeepScore(model)
score = similarity_measure.pair(spectrums[0], spectrums[1])

Contributing

We welcome contributions to the development of ms2deepscore! Have a look at the contribution guidelines.

Project details

Release history Release notifications | RSS feed

2.7.2

Feb 25, 2026

2.7.1

Feb 5, 2026

2.7.0

Jan 28, 2026

2.6.0

Dec 5, 2025

2.5.5

Nov 6, 2025

2.5.4

Aug 19, 2025

2.5.3

Jul 14, 2025

2.5.2

May 26, 2025

2.5.1

Feb 12, 2025

2.5.0

Dec 20, 2024

2.4.0

Nov 8, 2024

2.3.0

Oct 30, 2024

2.2.0

Oct 17, 2024

2.1.0

Oct 7, 2024

2.0.0

Mar 21, 2024

1.0.0

Mar 12, 2024

0.5.0

Aug 18, 2023

0.4.0

Apr 25, 2023

0.3.1

Jan 6, 2023

0.3.0.1

Mar 30, 2023

0.3.0

Dec 1, 2022

0.2.3

Mar 9, 2022

This version

0.2.2

Aug 19, 2021

0.2.1

Jul 20, 2021

0.2.0

Apr 12, 2021

0.1.3

Mar 9, 2021

0.1.2

Mar 5, 2021

0.1.1

Feb 9, 2021

0.1.0

Feb 8, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ms2deepscore-0.2.2.tar.gz (34.3 kB view details)

Uploaded Aug 19, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ms2deepscore-0.2.2-py3-none-any.whl (42.5 kB view details)

Uploaded Aug 19, 2021 Python 3

File details

Details for the file ms2deepscore-0.2.2.tar.gz.

File metadata

Download URL: ms2deepscore-0.2.2.tar.gz
Upload date: Aug 19, 2021
Size: 34.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6

File hashes

Hashes for ms2deepscore-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`6fe053748f865f856a59d79682ad9c149d490cf48958b8099efbd64fcabd5e4a`
MD5	`fa73585381ccdfbda321750476ae0651`
BLAKE2b-256	`462b2dbcb55ba0e5a76c6617b06faa5199abf106254a95499469d20d906786a8`

See more details on using hashes here.

File details

Details for the file ms2deepscore-0.2.2-py3-none-any.whl.

File metadata

Download URL: ms2deepscore-0.2.2-py3-none-any.whl
Upload date: Aug 19, 2021
Size: 42.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6

File hashes

Hashes for ms2deepscore-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`33ff7853d2e9c5fd57efb57480238cf557ce27b7892ce8492eeab616b0cdaddf`
MD5	`7ca811590abbe740eeb1c5450bf9d6a2`
BLAKE2b-256	`d0204f0bec692f58860eff0ef96c21cd513a06ec0c42f3a105ac5c791127fbf7`

See more details on using hashes here.

ms2deepscore 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ms2deepscore

Reference

Setup

Requirements

Installation

Prepare environment

Quick start: How to prepare data, train a model, and compute similarities.

1) Use a pretrained model to compute spectral similarities

2) Train an own MS2DeepScore model

Data preperation

Train a model

Predict similarity scores

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes