Skip to main content

Very Basic package make some usefull log transformation for scikit-learn

Project description

image License: GPL v3 Python Repo Size PEP8 Poetry Coverage Tests Statics Doc Pypi GitHub commit activity

Scikit-transformers : Scikit-learn + Custom transformers

About

Basic package to enable usefull transformers in scikit-learn pipelines.

First transformer implemented is a LogTransformer, which is a simple wrapper around the numpy log function.

Installation

Using regular pip and venv tools :

python3 -m venv .venv
source .venv/bin/activate
pip install scikit-transformers

Usage

For a very basic usage :

import pandas as pd

from sktransf import LogTransformer

df = pd.DataFrame(
    { "a": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
      "b": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    }
)

logger = LogTransformer()
logger.fit_transform(df)
df_transf = logger.transform(df)

Using common transformers :

import pandas as pd

from sktransf import LogTransformer, DropUniqueColumnTransformer, BoolColumnTransformer

df = pd.DataFrame(
    { "a": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
      "b": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    }
)

df_bool = BoolColumnTransformer().fit_transform(df)
df_unique = DropUniqueColumnTransformer().fit_transform(df)
df_logged = LogTransformer().fit_transform(df)

Using a pipeline :

import pandas as pd
from sklearn.pipeline import Pipeline

from sktransf import LogTransformer, DropUniqueColumnTransformer, BoolColumnTransformer

pipe = Pipeline([
    ('bool', BoolColumnTransformer()),
    ('unique', DropUniqueColumnTransformer()),
    ('log', LogTransformer())
])

df = pd.DataFrame(
    { "a": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
      "b": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    }
)

df_transf = pipe.fit_transform(df)

Using a pipeline with a scikit-learn model :

import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression

from sktransf import LogTransformer, DropUniqueColumnTransformer, BoolColumnTransformer

pipe = Pipeline([
    ('bool', BoolColumnTransformer()),
    ('unique', DropUniqueColumnTransformer()),
    ('log', LogTransformer()),
    ('model', LinearRegression())
])

X = pd.DataFrame(
    { "a": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
      "b": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    }
)

y = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

pipe.fit(X, y)

y_pred = pipe.predict(X)

Documentation

For more specific information, please refer to the notebooks:

A complete documentation is be available on the github page.

Changelog, Releases and Roadmap

Please refer to the changelog page for more information.

Contributing

Pull requests are welcome.

For major changes, please open an issue first to discuss what you would like to change.

For more information, please refer to the contributing page.

License

GPLv3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_transformers-0.2.1.tar.gz (17.3 kB view hashes)

Uploaded Source

Built Distribution

scikit_transformers-0.2.1-py3-none-any.whl (18.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page