No project description provided
Project description
MDSubSampler: Molecular Dynamics SubSampler
MDSubSampler is a Python library and toolkit for a posteriori subsampling of multiple trajectory data for further analysis. This toolkit implements uniform, random, stratified sampling, bootstrapping and targeted sampling to preserve the original distribution of relevant geometrical properties.
Prerequisites
This project requires Python (version 3.9.1 or later). To make sure you have the right version available on your machine, try running the following command.
$ python --version
Python 3.9.1
Table of contents
Getting Started
These instructions will get you a copy of the project up and running on your local machine for analysis and development purposes.
Installation
BEFORE YOU INSTALL: please read the prerequisites
To install and set up the library, run:
$ pip install MDSubSampler
Usage
Workflow
Input:
- Molecular Dynamics trajectory
- Geometric property
- Atom selection [optional - default is "name CA"]
- Reference structure [optional]
- Sample size or range of sizes
- Dissimilarity measure [optional - default is "Bhattacharyya"]
Output:
- .dat file with calculated property for full trajectory (user input)
- .dat file(s) with calculated property for one or all sample sizes input
- .xtc file(s) with sample trajectory for one or all sample sizes
- .npy file(s) with sample trajectory for one or all sample sizes
- .npy training set for ML purposes for sample trajectory (optional)
- .npy testing set for ML purposes for sample trajectory (optional)
- .npy file(s) with sample trajectory for one or for all sample sizes
- .png file with overlapped property distribution of reference and sample
- .json file report with important statistics from the analysis
- .txt log file with essential analysis steps and information
Scenarios
To run scenarios 1,2 or 3 you can download your protein trajectory and topology file (.xtc and .gro files) to the data folder and then run the following:
$ python mdss/scenarios/scenario_1.py data/<YourTrajectoryFile>.xtc data/<YourTopologyfile>.gro <YourPrefix>
Parser
If you are a terminal lover you can use the terminal to run the code and make a choice for the parser arguments. To see all options and choices run:
$ python mdss/run.py --help
Once you have made a selection of arguments, your command can look like the following example:
$ python mdss/run.py \
--traj "data/<YourTrajectoryFile>.xtc" \
--top "data/<YourTopologyFile>.gro" \
--prefix "<YourPrefix>" \
--output-folder "data/<YourResultsFolder>" \
--property='DistanceBetweenAtoms' \
--atom-selection='G55,P127' \
--sampler='BootstrappingSampler' \
--n-iterations=50 \
--size=<SampleSize> \
--dissimilarity='Bhattacharyya'
Development
Start by either downloading the tarball file from https://github.com/alepandini/MDSubSampler to your local machine or cloning this repo on your local machine:
$ git clone git@github.com:alepandini/MDSubSampler.git
$ cd MDSubSampler
Following that, download and install poetry from https://python-poetry.org/docs/#installation
Finally, run the following:
$ poetry install
$ poetry build
$ poetry shell
You can now start developing the library.
Authors
- Namir Oues - namiroues
- Alessandro Pandini alepandini
License
The library is licensed by GPL-3.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mdsubsampler-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26cf91480583760e5e08352ded9d0fdbb88b8a44d10c7199e606fc8162a6542f |
|
MD5 | 7827e08690b8f5f742410288d46c3fe9 |
|
BLAKE2b-256 | baa26e908e9d421e6ef695d4d21994c424b7741ccecd7090f9a6731605059106 |