Skip to main content

Fast and customizable framework for automatic ML model creation (AutoML)

Project description

LightAutoML (LAMA) - automatic model creation framework

Slack GitHub all releases PyPI - Downloads Read the Docs

LightAutoML (LAMA) project from Sberbank AI Lab AutoML group is the framework for automatic classification and regression model creation.

Current available tasks to solve:

  • binary classification
  • multiclass classification
  • regression

Currently we work with datasets, where each row is an object with its specific features and target. Multitable datasets and sequences are now under contruction :)

Note: for automatic creation of interpretable models we use AutoWoE library made by our group as well.

Authors: Ryzhkov Alexander, Vakhrushev Anton, Simakov Dmitrii, Bunakov Vasilii, Damdinov Rinchin, Shvets Pavel, Kirilin Alexander


Installation

Installation via pip from PyPI

To install LAMA framework on your machine:

pip install lightautoml

Installation from sources with virtual environment creation

If you want to create a specific virtual environment for LAMA, you need to install python3-venv system package and run the following command, which creates lama_venv virtual env with LAMA inside:

bash build_package.sh

To check this variant of installation and run all the demo scripts, use the command below:

bash test_package.sh

Docs generation

To generate documentation for LAMA framework, you can use command below (it uses virtual env created on installation step from sources):

bash build_docs.sh

Builded official documentation for LightAutoML is available here.


Usage examples

To find out how to work with LightAutoML, we have several tutorials:

  1. Tutorial_1. Create your own pipeline.ipynb - shows how to create your own pipeline from specified blocks: pipelines for feature generation and feature selection, ML algorithms, hyperparameter optimization etc.
  2. Tutorial_2. AutoML pipeline preset.ipynb - shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data. Using presets you can solve binary classification, multiclass classification and regression tasks, changing the first argument in Task.
  3. Tutorial_3. Multiclass task.ipynb - shows how to build ML pipeline for multiclass ML task by hand

Each tutorial has the step to enable Profiler and completes with Profiler run, which generates distribution for each function call time and shows it in interactive HTML report: the report show full time of run on its top and interactive tree of calls with percent of total time spent by the specific subtree.

Important 1: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default

Important 2: to take a look at this report after the run, please comment last line of demo with report deletion command.

For more examples, in tests folder you can find different scenarios of LAMA usage:

  1. demo0.py - building ML pipeline from blocks and fit + predict the pipeline itself.
  2. demo1.py - several ML pipelines creation (using importances based cutoff feature selector) to build 2 level stacking using AutoML class
  3. demo2.py - several ML pipelines creation (using iteartive feature selection algorithm) to build 2 level stacking using AutoML class
  4. demo3.py - several ML pipelines creation (using combination of cutoff and iterative FS algos) to build 2 level stacking using AutoML class
  5. demo4.py - creation of classification and regression tasks for AutoML with loss and evaluation metric setup
  6. demo5.py - 2 level stacking using AutoML class with different algos on first level including LGBM, Linear and LinearL1
  7. demo6.py - AutoML with nested CV usage
  8. demo7.py - AutoML preset usage for tabular datasets (predefined structure of AutoML pipeline and simple interface for users without building from blocks)
  9. demo8.py - creation pipelines from blocks to build AutoML, solving multiclass classification task
  10. demo9.py - AutoML time utilization preset usage for tabular datasets (predefined structure of AutoML pipeline and simple interface for users without building from blocks)
  11. demo10.py - creation pipelines from blocks (including CatBoost) to build AutoML , solving multiclass classification task
  12. demo11.py - AutoML NLP preset usage for tabular datasets with text columns
  13. demo12.py - AutoML tabular preset usage with custom validation scheme and multiprocessed inference

Questions / Issues / Suggestions

Write a message to us:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LightAutoML-0.2.6.tar.gz (161.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

LightAutoML-0.2.6-py3-none-any.whl (230.1 kB view details)

Uploaded Python 3

File details

Details for the file LightAutoML-0.2.6.tar.gz.

File metadata

  • Download URL: LightAutoML-0.2.6.tar.gz
  • Upload date:
  • Size: 161.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.6.9 Linux/5.4.0-1026-azure

File hashes

Hashes for LightAutoML-0.2.6.tar.gz
Algorithm Hash digest
SHA256 49d633a6b7f5c3c8fb8ef17bc46f66433426882bf92cfd1c8c82b33dd85d9825
MD5 ce82b7b5caeafb80063e1abf053171b3
BLAKE2b-256 7a4ebdd42ecedd4ed78f86974925a9360c93b5338dc42c8f75f6c8a41c673ca6

See more details on using hashes here.

File details

Details for the file LightAutoML-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: LightAutoML-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 230.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.6.9 Linux/5.4.0-1026-azure

File hashes

Hashes for LightAutoML-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 7635b9c5e3e11bdb812ee4566c3f909be7009afc88e526444b4c9f1f73b12a20
MD5 5be7af589ed401e45a2ddc7790420d31
BLAKE2b-256 24af060104452ed32fb8182d51ce47688b078b1a3f6ed12a1d9e3e9cbd1b4697

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page