Can show the delta of two pandas dataframes
Project description
deltaframe
Show and log the delta between two Pandas dataframes.
Install
pip install deltaframe
How to use deltaframe
First, lets create two dataframes (e.g. transaction date from consecutive days).
df_old = pd.DataFrame({
'date':['2013-11-24','2013-11-24','2013-11-24','2013-11-24'],
'id':['001','002','003','004'],
'quantity':[22,8,7,10],
'color':['Yellow','Orange','Red','Yellow'],
})
df_new = pd.DataFrame({
'date':['2013-11-24','2013-11-25','2013-11-24','2013-11-24'],
'id':['001','002', '004', '005'],
'quantity':[22,6,5,10],
'color':['Yellow','Orange','Red','Pink'],
})
df_old
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
date | id | quantity | color | |
---|---|---|---|---|
0 | 2013-11-24 | 001 | 22 | Yellow |
1 | 2013-11-24 | 002 | 8 | Orange |
2 | 2013-11-24 | 003 | 7 | Red |
3 | 2013-11-24 | 004 | 10 | Yellow |
df_new
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
date | id | quantity | color | |
---|---|---|---|---|
0 | 2013-11-24 | 001 | 22 | Yellow |
1 | 2013-11-25 | 002 | 6 | Orange |
2 | 2013-11-24 | 004 | 5 | Red |
3 | 2013-11-24 | 005 | 10 | Pink |
Show the delta
Let's look at the main function get_delta
first.
get_delta(df_old=df_old, df_new=df_new, unique_id="id", sort_by="date")
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
date | id | quantity | color | transaction | |
---|---|---|---|---|---|
4 | 2013-11-24 | 005 | 10.0 | Pink | added |
4 | 2013-11-24 | 003 | 7.0 | Red | removed |
5 | 2013-11-24 | 004 | 5.0 | Red | modified |
4 | 2013-11-25 | 002 | 6.0 | Orange | modified |
Show added, removed and modified rows.
It's also possible to just get information about added, removed or modified rows as shown in the following:
Show added rows with get_added
.
added_rows = get_added(df_old=df_old, df_new=df_new, unique_id="id")
added_rows
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
date | id | quantity | color | transaction | |
---|---|---|---|---|---|
4 | 2013-11-24 | 005 | 10.0 | Pink | added |
What about removed rows (in df_old but not any longer in df_new) ?
get_removed
removed_rows = get_removed(df_old=df_old, df_new=df_new, unique_id="id")
removed_rows
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
date | id | quantity | color | transaction | |
---|---|---|---|---|---|
4 | 2013-11-24 | 003 | 7.0 | Red | removed |
Awesome, finally we check for the modified rows (also showing added rows) with get_modified
.
modified_rows = get_modified(df_old=df_old, df_new=df_new, unique_id="id")
modified_rows
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
date | id | quantity | color | transaction | |
---|---|---|---|---|---|
4 | 2013-11-25 | 002 | 6 | Orange | modified |
5 | 2013-11-24 | 004 | 5 | Red | modified |
6 | 2013-11-24 | 005 | 10 | Pink | modified |
If we don't want to show added rows as modified, we can pass the added_rows dataframe created above.
modified_rows = get_modified(df_old=df_old, df_new=df_new, unique_id="id", added_rows=added_rows)
modified_rows
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
date | id | quantity | color | transaction | |
---|---|---|---|---|---|
4 | 2013-11-25 | 002 | 6 | Orange | modified |
5 | 2013-11-24 | 004 | 5 | Red | modified |
Logging the delta
Finally, it's also possible to log the delta (e.g. transactions over time).
Initially there is no log file so we set df_log=None
.
df_log = log_delta(df_log=None, df_old=df_old, df_new=df_new, unique_id="id")
df_log
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
date | id | quantity | color | transaction | |
---|---|---|---|---|---|
0 | 2013-11-24 | 001 | 22 | Yellow | added |
1 | 2013-11-24 | 002 | 8 | Orange | added |
2 | 2013-11-24 | 003 | 7 | Red | added |
3 | 2013-11-24 | 004 | 10 | Yellow | added |
When there's an existing log file we happily pass it to our logging function...
df_log = log_delta(df_log=df_log, df_old=df_old, df_new=df_new, unique_id="id")
df_log
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
date | id | quantity | color | transaction | |
---|---|---|---|---|---|
0 | 2013-11-24 | 001 | 22.0 | Yellow | added |
1 | 2013-11-24 | 002 | 8.0 | Orange | added |
2 | 2013-11-24 | 003 | 7.0 | Red | added |
3 | 2013-11-24 | 004 | 10.0 | Yellow | added |
4 | 2013-11-25 | 002 | 6.0 | Orange | modified |
5 | 2013-11-24 | 004 | 5.0 | Red | modified |
6 | 2013-11-24 | 005 | 10.0 | Pink | added |
7 | 2013-11-24 | 003 | 7.0 | Red | removed |
Finally, if we want to sort our log file by a particular column.
df_log = log_delta(df_log=df_log, df_old=df_old, df_new=df_new, unique_id="id", sort_by=["date"])
df_log
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
date | id | quantity | color | transaction | |
---|---|---|---|---|---|
0 | 2013-11-24 | 001 | 22.0 | Yellow | added |
1 | 2013-11-24 | 002 | 8.0 | Orange | added |
2 | 2013-11-24 | 003 | 7.0 | Red | added |
3 | 2013-11-24 | 004 | 10.0 | Yellow | added |
5 | 2013-11-24 | 004 | 5.0 | Red | modified |
6 | 2013-11-24 | 005 | 10.0 | Pink | added |
7 | 2013-11-24 | 005 | 10.0 | Pink | added |
8 | 2013-11-24 | 003 | 7.0 | Red | removed |
4 | 2013-11-25 | 002 | 6.0 | Orange | modified |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for deltaframe-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5582d2a102f1d14c62ca1c0ee2b00efc64a02b7f1a86d5fc911d664253513e2 |
|
MD5 | e4a8d377e14737d4346cb649da87f0f4 |
|
BLAKE2b-256 | d3ae343519dba75cec00b805f2f635fae8efb4ac84f9d45731dd2668877d6000 |