Used to compare 2 Pandas DFs
Project description
pdcompare
Used to compare two pandas DataFrame objects to see how they changed.
pip install pdcompare
Requirements
The DataFrames must have the same index to compare correctly. An error will be thrown if the index data-types do not match, and a warning will be thrown if the index names are different.
STEPS
Initialize and call the compare()
method:
from pdcompare import Compare
compare_object = Compare(df1,df2)
compare_object.compare()
To get a dictionary of the resulting comparison data call:
compare_object.output()
Output Details
Once you call the .output() method, you will receive a dictionary object in return. This dictionary has the following keys and associated values:
KEY | VALUE | VALUE Data Type |
---|---|---|
SUMMARY | high-level overview of differences | pd.DataFrame |
ADDED | list of all index values that were added | pd.Series |
ADDED_cols | list of all columns that were added | pd.Series |
REMOVED | list of all index values that were removed | pd.Series |
REMOVED_cols | list of all columns that were added | pd.Series |
CHANGED | (see below for details) | pd.DataFrame |
CHANGED output data
This data has the following columns
Column Header | Data |
---|---|
ID | Index value by which we tracked the alterations |
COLUMN | Column that we saw an index change values |
from | Value of specified column & index in the first table (old) |
to | Value of specified column & index in the second table (new) |
Examples
ScreamingFrog Crawl Comparison (SEO)
This is a great tool to compare crawls from different dates. Simply export the CSV files from ScreamingFrog. Then run this Google Colab notebook to create a Report in Google Sheets.
ScreamingFrog Crawl Compare in Colab
By default the code to connect to Google Sheets and do all the formatting is hidden, but feel free to peep behind the curtain to see how it was done. You can display the first block of code by opening using the drop-down triangle on the far left side of the block.Weed Price Comparison
For a simple, get acquainted quickly example, use this. Thanks to Vicki for pointing me in the direction of these small datasets; and thanks to Frank BI for supplying the free datasets. I used Frank's weed price data from 2004 and compared them to 2005 across the 50 states. The example can be found in this repo's example folder.
Thanks for using my code
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pdcompare-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a715ab5fcaf7e19450c749ce6d203748344556bc5f41824ef5ba85da501d9866 |
|
MD5 | aa19a609a1503fc903de9f7b06eff3f0 |
|
BLAKE2b-256 | e1fa7ca9616642d9a1a4e0fc8d21c2a673de2a340e195e70612b0f4298412f01 |