Feature Pipelines for Keras preprocessing layers.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

EasyFlow: Keras Feature Preprocessing Pipelines

Keras logo

About EasyFlow
Motivation
Installation
Example
Tutorials

About EasyFlow

The EasyFlow package implements an interface similar to SKLearn's Pipeline API that contains easy feature preprocessing pipelines to build a full training and inference pipeline natively in Keras. All pipelines are implemented as Keras layers.

Motivation

There is a need to have a similar interface for Keras that mimics the SKLearn Pipeline API such as Pipeline, FeatureUnion and ColumnTransformer, but natively in Keras as Keras layers. The usual design pattern especially for tabular data is to first do preprocessing with SKLearn and then feed the data to a Keras model. With EasyFlow you don't need to leave the Tensorflow/Keras ecosystem to build custom pipelines and your preprocessing pipeline is part of your model architecture.

Main interfaces are:

FeaturePreprocessor: This layer applies feature preprocessing steps and returns a separate layer for each step supplied. This gives more flexibility to the user and if a more advance network architecture is needed. For example something like a Wide and Deep network.
FeatureUnion: This layer is similar to FeaturePreprocessor with an extra step that concatenates all layers into a single layer.

Installation:

pip install easy-tensorflow

Example

Lets look at a quick example:

import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Normalization, StringLookup, IntegerLookup

# local imports
from easyflow.data import TensorflowDataMapper
from easyflow.preprocessing import FeatureUnion
from easyflow.preprocessing import (
    FeatureInputLayer,
    StringToIntegerLookup,
)

Read in data and map as tf.data.Dataset

Use the TensorflowDataMapper class to map pandas data frame to a tf.data.Dataset type.

file_url = "http://storage.googleapis.com/download.tensorflow.org/data/heart.csv"
dataframe = pd.read_csv(file_url)
labels = dataframe.pop("target")

batch_size = 32
dataset_mapper = TensorflowDataMapper() 
dataset = dataset_mapper.map(dataframe, labels)
train_data_set, val_data_set = dataset_mapper.split_data_set(dataset)
train_data_set = train_data_set.batch(batch_size)
val_data_set = val_data_set.batch(batch_size)

Set constants

NUMERICAL_FEATURES = ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope']
CATEGORICAL_FEATURES = ['sex', 'cp', 'fbs', 'restecg', 'exang', 'ca']
# thal is represented as a string
STRING_CATEGORICAL_FEATURES = ['thal']

dtype_mapper = {
    "age": tf.float32,
    "sex": tf.float32,
    "cp": tf.float32,
    "trestbps": tf.float32,
    "chol": tf.float32,
    "fbs": tf.float32,
    "restecg": tf.float32,
    "thalach": tf.float32,
    "exang": tf.float32,
    "oldpeak": tf.float32,
    "slope": tf.float32,
    "ca": tf.float32,
    "thal": tf.string,
}

Setup Preprocessing layer using FeatureUnion

This is the main part where EasyFlow fits in. We can now easily setup a feature preprocessing pipeline as a Keras layer with only a few lines of code.

feature_preprocessor_list = [
    ('numeric_encoder', Normalization(), NUMERICAL_FEATURES),
    ('categorical_encoder', IntegerLookup(output_mode='multi_hot'), CATEGORICAL_FEATURES),
    ('string_encoder', StringToIntegerLookup(), STRING_CATEGORICAL_FEATURES)
]

preprocessor = FeatureUnion(feature_preprocessor_list)
preprocessor.adapt(train_data_set)

feature_layer_inputs = FeatureInputLayer(dtype_mapper)
preprocessing_layer = preprocessor(feature_layer_inputs)

Set up network

# setup simple network
x = tf.keras.layers.Dense(128, activation="relu")(preprocessing_layer)
x = tf.keras.layers.Dropout(0.5)(x)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs=feature_layer_inputs, outputs=outputs)
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.BinaryCrossentropy(),
    metrics=[tf.keras.metrics.BinaryAccuracy(name='accuracy'), tf.keras.metrics.AUC(name='auc')])

Fit model

history=model.fit(train_data_set, validation_data=val_data_set, epochs=10)

Tutorials

Migrate an Sklearn training Pipeline to Tensorflow Keras:

In this notebook we look at ways to migrate an Sklearn training pipeline to Tensorflow Keras. There might be a few reasons to move from Sklearn to Tensorflow.

Single Input Multiple Output Preprocessor:

In this example we will show case how to apply different transformations and preprocessing steps on the same feature. What we have here is an example of a Single input Multiple output feature transformation scenario.

Preprocessing module quick intro:

The easyflow.preprocessing module contains functionality similar to what Sklearn does with its Pipeline, FeatureUnion and ColumnTransformer does. This is a quick introduction.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.5.0

Apr 15, 2024

1.4.3

Apr 26, 2023

1.4.2

Oct 22, 2022

1.4.1

Jul 28, 2022

1.4.0

Apr 18, 2022

1.3.1

Apr 6, 2022

1.3.0

Apr 3, 2022

1.2.1

Feb 10, 2022

1.2.0

Jan 19, 2022

1.1.3

Nov 7, 2021

1.1.2

Aug 19, 2021

1.1.1

Aug 8, 2021

1.1.0

Jul 31, 2021

1.0.0

Jul 1, 2021

0.1.9

Apr 13, 2021

0.1.8

Apr 10, 2021

0.1.7

Apr 10, 2021

0.1.4

Apr 9, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy-tensorflow-1.5.0.tar.gz (12.9 kB view hashes)

Uploaded Apr 15, 2024 Source

Built Distribution

easy_tensorflow-1.5.0-py3-none-any.whl (15.0 kB view hashes)

Uploaded Apr 15, 2024 Python 3

Hashes for easy-tensorflow-1.5.0.tar.gz

Hashes for easy-tensorflow-1.5.0.tar.gz
Algorithm	Hash digest
SHA256	`bab16058513736bf446444c98be5dd80d459f2a626eddd13121945f6db9c5ef0`
MD5	`b6e12a5fa5303b0301684ac319d25cbe`
BLAKE2b-256	`7bbac2693355087aa7913abcfd6bd21fd63696410e088c9410777d62df596151`

Hashes for easy_tensorflow-1.5.0-py3-none-any.whl

Hashes for easy_tensorflow-1.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9730a3ed9ca1d8e0d5d9a501b257d1977baa1caa99e1bf94e82205498916c0fb`
MD5	`d3936a519552974fede65afe80823f43`
BLAKE2b-256	`15d685d71f35f190f6c3cabe32fcdf61c725f7e87c7b3ac7976dbcff98464804`