Skip to main content

Python package for chemical machine learning.

Project description

Multidisciplinary Artificial Intelligence with Chemical Abstraction (MAICA)

This is a python package for machine learning with chemical data. It provides various pre-processing modules for chemical data, such as engineering conditions, chemical formulas, and molecular structures. Also, several wrapper classes and functions are included for chemical machine learning. This package was implemented based on Scikit-learn and PyTorch.

Installation

Before installing MAICA, several required packages should be installed in your environment. We highly recommend to use Anaconda to build your Python environment for MAICA.

  1. Install a cheminformatics package RDKit. RDKit is available at Anaconda archive. You can install RDKit using the following command in the Anaconda prompt.
conda install -c rdkit rdkit
  1. Install a deep learning framework PyTorch. If you want to build your machine learning models using GPU, CUDA >= 11.1 must be installed your machine. With CUDA of version 11.1, you can install PyTorch using the following command.
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge
  1. Install a graph-based deep learning framework PyTorch Geometric. It must be installed to build machine learning models that predict target values from molecular and crystal structures. You can install PyTorch Geometric using the following command.
conda install pytorch-geometric -c rusty1s -c conda-forge
  1. Install required packages from requirements.txt in GitHub. After downloading the requirements file, you can install all required packages using the following commend.
conda install --file requirements.txt
  1. (Optional) If your operating system is Windows, install Graphviz to visualize interpretable information of machine learning algorithms. You can install Graphviz using the following command.
conda install -c conda-forge python-graphviz
  1. Finally, install MAICA in your Python environment with the following command.
pip install maica

Examples

Follow the instructions in PyTorch Installation to install the PyTorch package on your environment.

Installation of PyTorch Geometric

Installation of RDKit

Package, module name

Many use a same package and module name, you could definitely do that. But this example package and its module's names are different: example_pypi_package and examplepy.

Open example_pypi_package folder with Visual Studio Code, Ctrl + Shift + F (Windows / Linux) or Cmd + Shift + F (MacOS) to find all occurrences of both names and replace them with your package and module's names. Also remember to change the name of the folder src/examplepy.

Simply and very roughly speaking, package name is used in pip install <PACKAGENAME> and module name is used in import <MODULENAME>. Both names should consist of lowercase basic letters (a-z). They may have underscores (_) if you really need them. Hyphen-minus (-) should not be used.

You'll also need to make sure the URL "https://pypi.org/project/example-pypi-package/" (replace example-pypi-package by your package name, with all _ becoming -) is not occupied.

Details on naming convention (click to show/hide)

Underscores (_) can be used but such use is discouraged. Numbers can be used if the name does not start with a number, but such use is also discouraged.

Name starting with a number and/or containing hyphen-minus (-) should not be used: although technically legal, such name causes a lot of trouble − users have to use importlib to import it.

Don't be fooled by the URL "pypi.org/project/example-pypi-package/" and the name "example-pypi-package" on pypi.org. pypi.org and pip system convert all _ to - and use the latter on the website / in pip command, but the real name is still with _, which users should use when importing the package.

There's also namespace to use if you need sub-packages.

Other changes

Make necessary changes in setup.py.

The package's version number __version__ is in src/examplepy/__init__.py. You may want to change that.

The example package is designed to be compatible with Python 3.6, 3.7, 3.8, 3.9, and will be tested against these versions. If you need to change the version range, you should change:

  • classifiers, python_requires in setup.py
  • envlist in tox.ini
  • matrix: python: in .github/workflows/test.yml

If you plan to upload to TestPyPI which is a playground of PyPI for testing purpose, change twine upload --repository pypi dist/* to twine upload --repository testpypi dist/* in the file .github/workflows/release.yml.

Development

pip

pip is a Python package manager. You already have pip if you use Python 3.4 and later version which include it by default. Read this to know how to check whether pip is installed. Read this if you need to install it.

Use VS Code

Visual Studio Code is the most popular code editor today, our example package is configured to work with VS Code.

Install VS Code extension "Python".

"Python" VS Code extension will suggest you install pylint. Also, the example package is configured to use pytest with VS Code + Python extensions, so, install pylint and pytest:

pip install pylint pytest

(It's likely you will be prompted to install them, if that's the case, you don't need to type and execute the command)

vscode.env's content is now PYTHONPATH=/;src/;${PYTHONPATH} which is good for Windows. If you use Linux or MacOS, you need to change it to PYTHONPATH=/:src/:${PYTHONPATH} (replacing ; with :). If the PATH is not properly set, you'll see linting errors in test files and pytest won't be able to run tests/test_*.py files correctly.

Close and reopen VS Code. You can now click the lab flask icon in the left menu and run all tests there, with pytest. pytest seems better than the standard unittest framework, it supports unittest thus you can keep using import unittest in your test files.

The example package also has a .editorconfig file. You may install VS Code extension "EditorConfig for VS Code" that uses the file. With current configuration, the EditorConfig tool can automatically use spaces (4 spaces for .py, 2 for others) for indentation, set UTF-8 encoding, LF end of lines, trim trailing whitespaces in non Markdown files, etc.

In VS Code, you can go to File -> Preferences -> Settings, type "Python Formatting Provider" in the search box, and choose one of the three Python code formatting tools (autopep8, black and yapf), you'll be prompted to install it. The shortcuts for formatting of a code file are Shift + Alt + F (Windows); Shift + Option (Alt) + F (MacOS); Ctrl + Shift + I (Linux).

Write your package

In src/examplepy/ (examplepy should have been replaced by your module name) folder, rename module1.py and write your code in it. Add more module .py files if you need to.

Write your tests

In tests/ folder, rename test_module1.py (to test_*.py) and write your unit test code (with unittest) in it. Add more test_*.py files if you need to.

The testing tool `tox` will be used in the automation with GitHub Actions CI/CD. If you want to use `tox` locally, click to read the "Use tox locally" section

Use tox locally

Install tox and run it:

pip install tox
tox

In our configuration, tox runs a check of source distribution using check-manifest (which requires your repo to be git-initialized (git init) and added (git add .) at least), setuptools's check, and unit tests using pytest. You don't need to install check-manifest and pytest though, tox will install them in a separate environment.

The automated tests are run against several Python versions, but on your machine, you might be using only one version of Python, if that is Python 3.9, then run:

tox -e py39

If you add more files to the root directory (example_pypi_package/), you'll need to add your file to check-manifest --ignore list in tox.ini.

Thanks to GitHub Actions' automated process, you don't need to generate distribution files locally. But if you insist, click to read the "Generate distribution files" section

Generate distribution files

Install tools

Install or upgrade setuptools and wheel:

python -m pip install --user --upgrade setuptools wheel

(If python3 is the command on your machine, change python to python3 in the above command, or add a line alias python=python3 to ~/.bashrc or ~/.bash_aliases file if you use bash on Linux)

Generate dist

From example_pypi_package directory, run the following command, in order to generate production version for source distribution (sdist) in dist folder:

python setup.py sdist bdist_wheel

Install locally

Optionally, you can install dist version of your package locally before uploading to PyPI or TestPyPI:

pip install dist/example_pypi_package-0.1.0.tar.gz

(You may need to uninstall existing package first:

pip uninstall example_pypi_package

There may be several installed packages with the same name, so run pip uninstall multiple times until it says no more package to remove.)

Upload to PyPI

Register on PyPI and get token

Register an account on PyPI, go to Account settings § API tokens, "Add API token". The PyPI token only appears once, copy it somewhere. If you missed it, delete the old and add a new token.

(Register a TestPyPI account if you are uploading to TestPyPI)

Set secret in GitHub repo

On the page of your newly created or existing GitHub repo, click Settings -> Secrets -> New repository secret, the Name should be PYPI_API_TOKEN and the Value should be your PyPI token (which starts with pypi-).

Push or release

The example package has automated tests and upload (publishing) already set up with GitHub Actions:

  • Every time you git push your master or main branch, the package is automatically tested against the desired Python versions with GitHub Actions.
  • Every time a new release (either the initial version or an updated version) is created, the package is automatically uploaded to PyPI with GitHub Actions.

View it on pypi.org

After your package is published on PyPI, go to https://pypi.org/project/example-pypi-package/ (_ becomes -). Copy the command on the page, execute it to download and install your package from PyPI. (or test.pypi.org if you use that)

If you publish the package to PyPI manually, click to read

Install Twine

Install or upgrade Twine:

python -m pip install --user --upgrade twine

Create a .pypirc file in your $HOME (~) directory, its content should be:

[pypi]
username = __token__
password = <PyPI token>

(Use [testpypi] instead of [pypi] if you are uploading to TestPyPI)

Replace <PyPI token> with your real PyPI token (which starts with pypi-).

(if you don't manually create $HOME/.pypirc, you will be prompted for a username (which should be __token__) and password (which should be your PyPI token) when you run Twine)

Upload

Run Twine to upload all of the archives under dist folder:

python -m twine upload --repository pypi dist/*

(use testpypi instead of pypi if you are uploading to TestPyPI)

Update

When you finished developing a newer version of your package, do the following things.

Modify the version number __version__ in src\examplepy__init__.py.

Delete all old versions in dist.

Run the following command again to regenerate dist:

python setup.py sdist bdist_wheel

Run the following command again to upload dist:

python -m twine upload --repository pypi dist/*

(use testpypi instead of pypi if needed)

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maica-0.1.2.tar.gz (35.0 kB view hashes)

Uploaded Source

Built Distribution

maica-0.1.2-py3-none-any.whl (53.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page