Skip to main content

a collection of lego bricks for scikit-learn pipelines

Project description

Build status Documentation Status

scikit-lego

We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to attempt to consolidate these into a package that offers code quality/testing. This project is a collaboration between multiple companies in the Netherlands. Note that we're not formally affiliated with the scikit-learn project at all.

Installation

Install scikit-lego via pip with

pip install scikit-lego

Alternatively, to edit and contribute you can fork/clone and run:

pip install -e ".[dev]"
python setup.py develop

Documentation

The documentation can be found here.

Usage

from sklego.preprocessing import RandomAdder
from sklego.mixture import GMMClassifier

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

...

mod = Pipeline([
    ("scale", StandardScaler()),
    ("random_noise", RandomAdder()),
    ("model", GMMClassifier())
])

...

Features

Here's a list of features that this library currently offers:

  • sklego.preprocessing.PatsyTransformer applies a patsy formula
  • sklego.preprocessing.RandomAdder adds randomness in training
  • sklego.preprocessing.PandasTypeSelector selects columns based on pandas type
  • sklego.preprocessing.ColumnSelector selects columns based on column name
  • sklego.preprocessing.ColumnCapper limits extreme values of the model features
  • sklego.preprocessing.OrthogonalTransformer makes all features linearly independant
  • sklego.dummy.RandomRegressor benchmark that predicts random values
  • sklego.naive_bayes.GaussianMixtureNB classifies by training a 1D GMM per column per class
  • sklego.mixture.GMMClassifier classifies by training a GMM per class
  • sklego.mixture.GMMOutlierDetector detects outliers based on a trained GMM
  • sklego.pandas_utils.log_step a simple logger-decorator for pandas pipeline steps
  • sklego.pandas_utils.add_lags adds lag values of certain columns in pandas
  • sklego.pipeline.DebugPipeline adds debug information to make debugging easier
  • sklego.meta.DecayEstimator adds decay to the sample_weight that the model accepts
  • sklego.meta.GroupedEstimator can split the data into runs and run a model on each
  • sklego.meta.EstimatorTransformer adds a model output as a feature
  • sklego.metrics.correlation_score calculates correlation between model output and feature
  • sklego.metrics.p_percent_score proxy for model fairness with regards to sensitive attribute
  • sklego.datasets.load_chicken loads in the joyful chickweight dataset

New Features

We want to be rather open here in what we accept but we do demand three things before they become added to the project:

  1. any new feature contributes towards a demonstratable real-world usecase
  2. any new feature passes standard unit tests (we have a few for transformers and predictors)
  3. the feature has been discussed in the issue list beforehand

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-lego-0.1.8.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scikit_lego-0.1.8-py2.py3-none-any.whl (46.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file scikit-lego-0.1.8.tar.gz.

File metadata

  • Download URL: scikit-lego-0.1.8.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6

File hashes

Hashes for scikit-lego-0.1.8.tar.gz
Algorithm Hash digest
SHA256 7330628a25c33fb3955db072bd60545455db6ed2bc46648572b9dce453b76bec
MD5 89afb3da77904967d330c6b57f2b62a8
BLAKE2b-256 843e27bde2c4a19d3a8663e5265093dea9d1540f591b4292128ef6a6f8d2157b

See more details on using hashes here.

File details

Details for the file scikit_lego-0.1.8-py2.py3-none-any.whl.

File metadata

  • Download URL: scikit_lego-0.1.8-py2.py3-none-any.whl
  • Upload date:
  • Size: 46.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6

File hashes

Hashes for scikit_lego-0.1.8-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 fda68b1d2fbcbfdca163862c3dcb44358c0e52c1ad67871c9f450aa20a5cfa08
MD5 327d774497472be3d85f3555e7b89b07
BLAKE2b-256 0bc3764a9062fd1f3375e33c648a174af66021d005bce781d847a14b13819dbd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page