a collection of lego bricks for scikit-learn pipelines
Project description
scikit-lego
We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to attempt to consolidate these into a package that offers code quality/testing. This project is a collaboration between multiple companies in the Netherlands. Note that we're not formally affiliated with the scikit-learn project at all.
Installation
Install scikit-lego via pip with
pip install scikit-lego
Alternatively, to edit and contribute you can fork/clone and run:
pip install -e ".[dev]"
python setup.py develop
Documentation
The documentation can be found here.
Usage
from sklego.preprocessing import RandomAdder
from sklego.mixture import GMMClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
...
mod = Pipeline([
("scale", StandardScaler()),
("random_noise", RandomAdder()),
("model", GMMClassifier())
])
...
Features
Here's a list of features that this library currently offers:
sklego.preprocessing.PatsyTransformerapplies a patsy formulasklego.preprocessing.RandomAdderadds randomness in trainingsklego.preprocessing.PandasTypeSelectorselects columns based on pandas typesklego.preprocessing.ColumnSelectorselects columns based on column namesklego.preprocessing.ColumnCapperlimits extreme values of the model featuressklego.preprocessing.OrthogonalTransformermakes all features linearly independantsklego.dummy.RandomRegressorbenchmark that predicts random valuessklego.naive_bayes.GaussianMixtureNBclassifies by training a 1D GMM per column per classsklego.mixture.GMMClassifierclassifies by training a GMM per classsklego.mixture.GMMOutlierDetectordetects outliers based on a trained GMMsklego.pandas_utils.log_stepa simple logger-decorator for pandas pipeline stepssklego.pandas_utils.add_lagsadds lag values of certain columns in pandassklego.pipeline.DebugPipelineadds debug information to make debugging easiersklego.meta.DecayEstimatoradds decay to the sample_weight that the model acceptssklego.meta.GroupedEstimatorcan split the data into runs and run a model on eachsklego.meta.EstimatorTransformeradds a model output as a featuresklego.metrics.correlation_scorecalculates correlation between model output and featuresklego.metrics.p_percent_scoreproxy for model fairness with regards to sensitive attributesklego.datasets.load_chickenloads in the joyful chickweight dataset
New Features
We want to be rather open here in what we accept but we do demand three things before they become added to the project:
- any new feature contributes towards a demonstratable real-world usecase
- any new feature passes standard unit tests (we have a few for transformers and predictors)
- the feature has been discussed in the issue list beforehand
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scikit-lego-0.1.8.tar.gz.
File metadata
- Download URL: scikit-lego-0.1.8.tar.gz
- Upload date:
- Size: 31.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7330628a25c33fb3955db072bd60545455db6ed2bc46648572b9dce453b76bec
|
|
| MD5 |
89afb3da77904967d330c6b57f2b62a8
|
|
| BLAKE2b-256 |
843e27bde2c4a19d3a8663e5265093dea9d1540f591b4292128ef6a6f8d2157b
|
File details
Details for the file scikit_lego-0.1.8-py2.py3-none-any.whl.
File metadata
- Download URL: scikit_lego-0.1.8-py2.py3-none-any.whl
- Upload date:
- Size: 46.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fda68b1d2fbcbfdca163862c3dcb44358c0e52c1ad67871c9f450aa20a5cfa08
|
|
| MD5 |
327d774497472be3d85f3555e7b89b07
|
|
| BLAKE2b-256 |
0bc3764a9062fd1f3375e33c648a174af66021d005bce781d847a14b13819dbd
|