An interface containing easy tensorflow model building blocks and feature pipelines

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Easy Tensorflow:

An interface containing easy tensorflow model building blocks and feature encoding pipelines

Model file structure:

├── easyflow
│   ├── __init__.py
│   ├── data
│   │   ├── __init__.py
│   │   ├── mapper.py
│   ├── feature_encoders
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── categorical_encoders.py
│   │   ├── numerical_encoders.py
│   │   └── pipeline.py
│   ├── preprocessing
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── custom.py
│   │   ├── pipeline.py
│   └── tests
│       ├── __init__.py
│       ├── test_data
│       │   └── heart.csv
│       ├── test_feature_encoders.py
│       └── test_preprocessing.py
├── notebooks
│   ├── feature_column_example.ipynb
│   └── preprocessing_example.ipynb
├── CHANGELOG.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── requirements.txt
└── setup.py

To install package:

pip install easy-tensorflow

Example 1: Preprocessing Pipeline and FeatureUnion example

The easyflow.preprocessing module contains functionality similar to what sklearn does with its Pipeline, FeatureUnion and ColumnTransformer does. Full example also in notebooks folder

import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers.experimental.preprocessing import Normalization, StringLookup, IntegerLookup

# local imports
from easyflow.data import TensorflowDataMapper
from easyflow.preprocessing import FeatureUnion

Read in data and map as tf.data.Dataset

Use the TensorflowDataMapper class to map pandas data frame to a tf.data.Dataset type.

file_url = "http://storage.googleapis.com/download.tensorflow.org/data/heart.csv"
dataframe = pd.read_csv(file_url)
labels = dataframe.pop("target")

batch_size = 32
dataset_mapper = TensorflowDataMapper() 
dataset = dataset_mapper.map(dataframe, labels)
train_data_set, val_data_set = dataset_mapper.split_data_set(dataset)
train_data_set = train_data_set.batch(batch_size)
val_data_set = val_data_set.batch(batch_size)

Set constants

NUMERICAL_FEATURES = ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope']
CATEGORICAL_FEATURES = ['sex', 'cp', 'fbs', 'restecg', 'exang', 'ca']
# thal is represented as a string
STRING_CATEGORICAL_FEATURES = ['thal']

Setup Preprocessing layer using FeatureUnion

feature_encoder_list = [
                        ('numeric_encoder', Normalization(), NUMERICAL_FEATURES),
                        ('categorical_encoder', IntegerLookup(output_mode='binary'), CATEGORICAL_FEATURES),
                        # For feature thal we first need to run StringLookup followed by a IntegerLookup layer
                        ('string_encoder', [StringLookup(), IntegerLookup(output_mode='binary')], STRING_CATEGORICAL_FEATURES)
                        ]

encoder = FeatureUnion(feature_encoder_list)
all_feature_inputs, preprocessing_layer = encoder.encode(dataset)

Set up network

# setup simple network
x = tf.keras.layers.Dense(128, activation="relu")(preprocessing_layer)
x = tf.keras.layers.Dropout(0.5)(x)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs=all_feature_inputs, outputs=outputs)
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.BinaryCrossentropy(),
    metrics=[tf.keras.metrics.BinaryAccuracy(name='accuracy'), tf.keras.metrics.AUC(name='auc')])

Fit model

history=model.fit(train_data_set, validation_data=val_data_set, epochs=10)

Example 2: Model building Pipeline using easyflow feature encoders module

This module is a fusion between keras layers and tensorflow feature columns.

FeatureColumnTransformer and FeatureUnionTransformer are the main interfaces and serves as feature transformation pipelines.

Wrapper classes exists for the following feature_columns

CategoricalFeatureEncoder
EmbeddingFeatureEncoder
EmbeddingCrossingFeatureEncoder
CategoryCrossingFeatureEncoder
NumericalFeatureEncoder
BucketizedFeatureEncoder

To create a custom encoder or one where wrapper class does not exist, there are two base interfaces to use:

BaseFeatureColumnEncoder
BaseCategoricalFeatureColumnEncoder

import pandas as pd
import tensorflow as tf

# local imports
from easyflow.data import TensorflowDataMapper
from easyflow.feature_encoders import FeatureColumnTransformer, FeatureUnionTransformer
from easyflow.feature_encoders import NumericalFeatureEncoder, EmbeddingFeatureEncoder, CategoricalFeatureEncoder

Load data

CSV_HEADER = [
    "age",
    "workclass",
    "fnlwgt",
    "education",
    "education_num",
    "marital_status",
    "occupation",
    "relationship",
    "race",
    "gender",
    "capital_gain",
    "capital_loss",
    "hours_per_week",
    "native_country",
    "income_bracket",
]

data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"


data_frame = pd.read_csv(data_url, header=None, names=CSV_HEADER)
labels = data_frame.pop("income_bracket")
labels_binary = 1.0 * (labels == " >50K")
data_frame.to_csv('adult_features.csv', index=False)
labels_binary.to_csv('adult_labels.csv', index=False)

Map data frame to tf.data.Dataset

batch_size = 256
dataset_mapper = TensorflowDataMapper() 
dataset = dataset_mapper.map(data_frame, labels_binary)

train_data_set, val_data_set = dataset_mapper.split_data_set(dataset)
train_data_set = train_data_set.batch(batch_size)
val_data_set = val_data_set.batch(batch_size)

Set up the feature encoding list

NUMERIC_FEATURE_NAMES = [
    "age",
    "education_num",
    "capital_gain",
    "capital_loss",
    "hours_per_week",
]

CATEGORICAL_FEATURES_NAMES = [
    "workclass",
    "marital_status",
    "relationship",
    "race",
    "gender"]

EMBEDDING_FEATURES_NAMES = ['education',
                            'occupation',
                            'native_country']

feature_encoder_list = [('numerical_features', NumericalFeatureEncoder(), NUMERIC_FEATURE_NAMES),
                        ('categorical_features', CategoricalFeatureEncoder(), CATEGORICAL_FEATURES_NAMES),
                        ('embedding_features_deep', EmbeddingFeatureEncoder(dimension=10), EMBEDDING_FEATURES_NAMES),
                        ('embedding_features_wide', CategoricalFeatureEncoder(), EMBEDDING_FEATURES_NAMES)]

Setting up feature layer and feature encoders

There are two main column transformer classes namely FeatureColumnTransformer and FeatureUnionTransformer. For this example we are going to build a Wide and Deep model architecture. So we will be using the FeatureColumnTransformer since it gives us more flexibility. FeatureUnionTransformer concatenates all the features in the input layer

feature_layer_inputs, feature_layer =  FeatureColumnTransformer(feature_encoder_list).transform(train_data_set)

deep = tf.keras.layers.concatenate([feature_layer['numerical_features'],
                                    feature_layer['categorical_features'],
                                    feature_layer['embedding_features_deep']])

wide = feature_layer['embedding_features_wide']

Set up Wide and Deep model architecture

deep = tf.keras.layers.BatchNormalization()(deep)

for nodes in [128, 64, 32]:
    deep = tf.keras.layers.Dense(nodes, activation='relu')(deep)
    deep = tf.keras.layers.Dropout(0.5)(deep)

# combine wide and deep layers
wide_and_deep = tf.keras.layers.concatenate([deep, wide])
output = tf.keras.layers.Dense(1, activation='sigmoid')(wide_and_deep)
model = tf.keras.Model(inputs=[v for v in feature_layer_inputs.values()], outputs=output)
model.compile(loss=tf.keras.losses.BinaryCrossentropy(label_smoothing=0.0),
              optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              metrics=[tf.keras.metrics.BinaryAccuracy(name='accuracy'), tf.keras.metrics.AUC(name='auc')])

Fit model

model.fit(train_data_set, validation_data=val_data_set, epochs=10)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.5.0

Apr 15, 2024

1.4.3

Apr 26, 2023

1.4.2

Oct 22, 2022

1.4.1

Jul 28, 2022

1.4.0

Apr 18, 2022

1.3.1

Apr 6, 2022

1.3.0

Apr 3, 2022

1.2.1

Feb 10, 2022

1.2.0

Jan 19, 2022

1.1.3

Nov 7, 2021

This version

1.1.2

Aug 19, 2021

1.1.1

Aug 8, 2021

1.1.0

Jul 31, 2021

1.0.0

Jul 1, 2021

0.1.9

Apr 13, 2021

0.1.8

Apr 10, 2021

0.1.7

Apr 10, 2021

0.1.4

Apr 9, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy-tensorflow-1.1.2.tar.gz (12.1 kB view hashes)

Uploaded Aug 19, 2021 Source

Built Distributions

easy_tensorflow-1.1.2-py3.6.egg (36.6 kB view hashes)

Uploaded Aug 19, 2021 Source

easy_tensorflow-1.1.2-py3-none-any.whl (17.5 kB view hashes)

Uploaded Aug 19, 2021 Python 3

Hashes for easy-tensorflow-1.1.2.tar.gz

Hashes for easy-tensorflow-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`6e8443579a114c1cc1fe073c60bb53b24726b046eee859d7131189623b12ff69`
MD5	`0a240eee2d58fb75258e9f8910f86659`
BLAKE2b-256	`19015544cefb12cb782903c7b9a12e798a0fe5788d4f99bd83792333f3570b06`

Hashes for easy_tensorflow-1.1.2-py3.6.egg

Hashes for easy_tensorflow-1.1.2-py3.6.egg
Algorithm	Hash digest
SHA256	`90f40cd3758f9b096b35b6e59368e33d4c9e5d6fce72a41908347142f91b5200`
MD5	`d98659e2f3dd7719f3ec2ad0df29d1b2`
BLAKE2b-256	`37cabb83bb1eec4c4b77357c0ff34758706a40b031836230b066f5544f6f376d`

Hashes for easy_tensorflow-1.1.2-py3-none-any.whl

Hashes for easy_tensorflow-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9b445edc82e2ec6351ac29499500ed98cab5f1a5e467c0e00417f172db43c54b`
MD5	`94787e60dc988de9856e407775ae4a5b`
BLAKE2b-256	`9e5e5a946003ea2613439f416ac6b9109415f890103164bb95bc2cce3986e8a3`