Skip to main content

Can show the delta of two pandas dataframes

Project description

deltaframe

Show and log the delta between two Pandas dataframes.

Install

pip install deltaframe

How to use deltaframe

First, lets create two dataframes (e.g. transaction date from consecutive days).

df_old = pd.DataFrame({
    'date':['2013-11-24','2013-11-24','2013-11-24','2013-11-24'],
    'id':['001','002','003','004'],
    'quantity':[22,8,7,10],
    'color':['Yellow','Orange','Red','Yellow'],
})
df_new = pd.DataFrame({
    'date':['2013-11-24','2013-11-25','2013-11-24','2013-11-24'],
    'id':['001','002', '004', '005'],
    'quantity':[22,6,5,10],
    'color':['Yellow','Orange','Red','Pink'],
})
df_old
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
date id quantity color
0 2013-11-24 001 22 Yellow
1 2013-11-24 002 8 Orange
2 2013-11-24 003 7 Red
3 2013-11-24 004 10 Yellow
df_new
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
date id quantity color
0 2013-11-24 001 22 Yellow
1 2013-11-25 002 6 Orange
2 2013-11-24 004 5 Red
3 2013-11-24 005 10 Pink

Show the delta

Let's look at the main function get_delta first.

get_delta(df_old=df_old, df_new=df_new, unique_id="id", sort_by="date")
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
date id quantity color transaction
4 2013-11-24 005 10.0 Pink added
4 2013-11-24 003 7.0 Red removed
5 2013-11-24 004 5.0 Red modified
4 2013-11-25 002 6.0 Orange modified

Show added, removed and modified rows.

It's also possible to just get information about added, removed or modified rows as shown in the following:

Show added rows with get_added.

added_rows = get_added(df_old=df_old, df_new=df_new, unique_id="id")
added_rows
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
date id quantity color transaction
4 2013-11-24 005 10.0 Pink added

What about removed rows (in df_old but not any longer in df_new) ?

get_removed

removed_rows = get_removed(df_old=df_old, df_new=df_new, unique_id="id")
removed_rows
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
date id quantity color transaction
4 2013-11-24 003 7.0 Red removed

Awesome, finally we check for the modified rows (also showing added rows) with get_modified.

modified_rows = get_modified(df_old=df_old, df_new=df_new, unique_id="id")
modified_rows
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
date id quantity color transaction
4 2013-11-25 002 6 Orange modified
5 2013-11-24 004 5 Red modified
6 2013-11-24 005 10 Pink modified

If we don't want to show added rows as modified, we can pass the added_rows dataframe created above.

modified_rows = get_modified(df_old=df_old, df_new=df_new, unique_id="id", added_rows=added_rows)
modified_rows
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
date id quantity color transaction
4 2013-11-25 002 6 Orange modified
5 2013-11-24 004 5 Red modified

Logging the delta

Finally, it's also possible to log the delta (e.g. transactions over time).

Initially there is no log file so we set df_log=None.

df_log = log_delta(df_log=None, df_old=df_old, df_new=df_new, unique_id="id")
df_log
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
date id quantity color transaction
0 2013-11-24 001 22 Yellow added
1 2013-11-24 002 8 Orange added
2 2013-11-24 003 7 Red added
3 2013-11-24 004 10 Yellow added

When there's an existing log file we happily pass it to our logging function...

df_log = log_delta(df_log=df_log, df_old=df_old, df_new=df_new, unique_id="id")
df_log
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
date id quantity color transaction
0 2013-11-24 001 22.0 Yellow added
1 2013-11-24 002 8.0 Orange added
2 2013-11-24 003 7.0 Red added
3 2013-11-24 004 10.0 Yellow added
4 2013-11-25 002 6.0 Orange modified
5 2013-11-24 004 5.0 Red modified
6 2013-11-24 005 10.0 Pink added
7 2013-11-24 003 7.0 Red removed

Finally, if we want to sort our log file by a particular column.

df_log = log_delta(df_log=df_log, df_old=df_old, df_new=df_new, unique_id="id", sort_by=["date"])
df_log
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
date id quantity color transaction
0 2013-11-24 001 22.0 Yellow added
1 2013-11-24 002 8.0 Orange added
2 2013-11-24 003 7.0 Red added
3 2013-11-24 004 10.0 Yellow added
5 2013-11-24 004 5.0 Red modified
6 2013-11-24 005 10.0 Pink added
7 2013-11-24 005 10.0 Pink added
8 2013-11-24 003 7.0 Red removed
4 2013-11-25 002 6.0 Orange modified

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deltaframe-0.0.1.tar.gz (12.5 kB view hashes)

Uploaded Source

Built Distribution

deltaframe-0.0.1-py3-none-any.whl (8.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page