Skip to main content

A small example package

Project description

python ml skeleton project

generic skeleton for machine learning project with python, hydra, pytest, sphinx, github actions, etc. with dummy functionalities! It is mostly oriented geospatial projects

PyPI python PyPI version License Documentation Status pre-commit.ci status codecov

Why this project?

The goal of this project is to present a standard architecture of python repository/package including a full CiCd pipeline to document/test/deploy your project with standard methods of 2022. It can be used as starting point for any project without reinventing the wheel.

The code has no interest!

The code of this project is totally dummy: it makes simple mathematics operations like addition and subtration! The next iteration will make the opetations more interesting by using multi-layers perceptron! It will try to add a complete example of Hydra configuration.

In a close future, it will serve as a demonstrator by the example of a standard ML pipeline for experimentation and production

Installation

Install requirements

As Gdal dependencies are presents it's preferable to install dependencis via conda before installing the package:

  git clone https://github.com/samysung/python_ml_project_skeleton
  cd python_ml_project_skeleton/packaging
  conda env create -f package_env.yml

From pip:

pip install pmps
or pip install pmps==vx.x # for a specific version
Other installation options

From source:

python setup.py install

From source with symbolic links:

pip install -e .

From source using pip:

pip install git+https://github.com/samysung/python_ml_project_skeleton

Project Architecture

├── CHANGELOG.rst
├── .codecov.yml
├── deploy
│   └── dockerfile
├── docs
│   ├── add.rst
│   ├── build.sh
│   ├── changelog.rst
│   ├── conf.py
│   ├── deploy.sh
│   ├── index.rst
│   ├── Makefile
│   ├── readme_link.md
│   └── _static
│       └── img
├── .github
│   └── workflows
│       ├── publish.yml
│       ├── test_code.yml
│       ├── test_docs.yml
│       ├── test_packaging.yml
│       └── test_publish.yml
├── .gitignore
├── LICENSE
├── packaging
│   ├── doc_env.yml
│   ├── doc_requirements.txt
│   ├── package_env.yml
│   ├── requirements.txt
│   ├── test_env.yml
│   └── test_requirements.txt
├── pmps
│   ├── api
│      ├── add.py
│      ├── __init__.py
│      └── subtract.py
│   ├── core
│      ├── add.py
│      ├── __init__.py
│      └── subtract.py
│   └── __init__.py
├── .pre-commit-config.yaml
├── .pylintrc
├── README.md
├── readthedocs.yml
├── setup.cfg
├── setup.py
├── tests
│   ├── api
│      ├── __init__.py
│      ├── test_add.py
│      └── test_subtract.py
│   └── __init__.py
└── VERSION

Architecture component overview

Component Path Description
Python Package pmps/ where the python executable code is localized. It is your root package as it's the first directory to contain a init.py and its name is generally the one you choose for your publishing package (the one build and published on forge like pypi conda, etc. Don't forget for any subpackage to add an init.py module to declare it as python package. NB: separate core and api in different sub package is a design choice not standard, it comes from java world but a lot of python project prefers declaring private python modules.
Documentation docs/ the source code of your documentation: conf.py is where you configure your sphinx doc, _static/ for your additional statis files (img, text, icon, video, etc.), doc is built under docs/_build/html but can be modified in maekfile.
Tests Package tests/ where you organize the test code of your executable code. Your unit tests (pytest is the library used) should at least test what you expose to your clients, you can add static analysis of your tests code with extentions like mypy and flake8. Use the pytest-cov extension to produce test cover reporting.
Python Env packaging/ Place for your conda environment files and requirement files.
Deployment deploy/ Place for Dockerfiles or any other deployment solution
CI/CD workflows .github/ github workflows configuration files (details below)
CD (Documentation publishing) .readthedocs.yml configuration of the documentation publication on readthedocs (see readthedocs link)
CI (tests covering publishing) .codecov.yml configuration of the code covering pubication on codecov (see codecov)
CI (static analysis publishing) .pre-commit.yml configuration of the pre-commit publication (see pre-commit)
CD (packaging) setup.cfg and setup.py configuration files for packaging on pipy, local, etc (see python doc)

CI/CD pipeline

The first and essential goal is to have a skeleton quickly editable for a lot of use case projects with a big emphasis on continuous integration and continuous deployment. Here is a schematic view of the Ci/Cd pipeline targeted for open source python project, largely inspired by others well known projects:

DIAGRAM

Ci/Cd diagram

Github Workflows

test code worflow (.github/workflows/test_code.yml):

Used to run unit test (and functionnal if implemented) tests on pull request events or push on main branch. It publishes coverage results on codecov.io. Use the packaging/test_env.yml conda environment file, github cache action and codecov/codecov-action

test docs workflow (.github/workflows/test_docs.yml):

Used to test the build of sphinx documentation. Run on pull request events or push on main branch. Use the packaging/doc_env.yml conda environment file, and the github cache action.

publish workflow (.github/workflows/publish.yml):

Used to publish the package on pypi, when a new tagged version or release is published. Use the packaging/package_env.yml conda environment file, github cache action github download and upload artifacts, and gh-action-pypi-publish.

test publish workflow (.github/workflows/test_publish.yml):

Same worflow as above, but on a test branch and test.pypi forge, for testing deployment improvement recipes

test packaging workflow (.github/workflows/test_packaging.yml):

Worflow actioned by CRON event (see crontab-guru), every n hours. Used to test that the package has been published and the lasted version is working.

Github workflows based on github webhooks or githu Apps

Some workflow works are handled by third party applications, like the readthe docs publication or the online pre-commit static analysis.

Pre-commit

Pre-commit action is launched via a github app (pre-commit.ci) on every commit made on remote. it's configured via the file pre-commit-config.yaml

Read-the-docs publication

Readthedocs publish new documentation version via a webhook subscribed for push and commit event. You can configure the type of push trigering the process in the readthedocs.org configuration section. See read the docs documentation for more detail.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pmps-0.2.tar.gz (12.9 kB view hashes)

Uploaded Source

Built Distribution

pmps-0.2-py3-none-any.whl (11.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page