Examine snapshots in eeb archives such as the Internet Archive's Wayback Machine
Project description
memento-cli
A command line tool interacting with Memento (RFC 7089) supporting web archives, such as the Internet Archive's Wayback Machine.
For more background on why this tool was created see: https://inkdroid.org/2023/09/14/memento-bisect/
Usage
List Snapshots
To list all the available snapshots (or Mementos) for a given snapshot you can use the list command:
$ memento list https://web.archive.org/web/20230407140923/https:/help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2017-12-29 05:40:51 https://web.archive.org/web/20171229054051/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-03 20:03:00 https://web.archive.org/web/20180103200300/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-04 06:39:58 https://web.archive.org/web/20180104063958/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-06 16:08:07 https://web.archive.org/web/20180106160807/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 06:10:07 https://web.archive.org/web/20180112061007/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 17:40:16 https://web.archive.org/web/20180112174016/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 18:40:34 https://web.archive.org/web/20180112184034/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 19:11:48 https://web.archive.org/web/20180112191148/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-20 19:05:57 https://web.archive.org/web/20180120190557/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-20 19:19:20 https://web.archive.org/web/20180120191920/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
...
Since memento works with any RFC 7089 supporting archive you can use it to list versions in other web archives as well:
$ memento list https://www.webarchive.org.uk/wayback/archive/20130501020401/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2013-05-01 02:03:57 https://www.webarchive.org.uk/wayback/archive/20130501020357mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition
2013-05-01 02:04:01 https://www.webarchive.org.uk/wayback/archive/20130501020401mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2013-07-29 12:58:03 https://www.webarchive.org.uk/wayback/archive/20130729125803mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition
2013-07-29 12:58:06 https://www.webarchive.org.uk/wayback/archive/20130729125806mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2021-01-22 06:38:21 https://www.webarchive.org.uk/wayback/archive/20210122063821mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2022-03-14 16:36:16 https://www.webarchive.org.uk/wayback/archive/20220314163616mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
Searching for Changes (Bisect)
Let's suppose you know that the Twitter Hateful Conduct Policy used to have language about:
women, people of color, lesbian, gay, bisexual, transgender, queer, intersex, asexual individuals
You can see it in the Internet Archive Wayback Machine in 2019. But you can't see it on the page in 2023. To identify when the change was introduced, you can bisect the version history to search for the version where the text went missing, using the two snapshots and the --text option. This will perform a binary search between the two versions looking for the text.
$ memento bisect --missing --text "women, people of color, lesbian, gay" \
https://web.archive.org/web/20190711134608/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy \
https://web.archive.org/web/20230621094005/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
The --text value can be a regular expression too if you want. If you only provide one snapshot URL it will use that as the start index, and use the last snapshot in the archive as the end.
The bisect command uses a browser behind the scenes (using Selenium) in order to fully render the page. If you wanted to find out when some text appears (rather than goes missing) then remove the --missing parameter from the command.
And if you would prefer to examine the pages in between manually, leave off the --text parameter and memento will prompt you to continue, and show you the browser it is controlling.
If you would like to see the browser when using --text use the --show-browser option.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file memento_cli-0.0.4.tar.gz.
File metadata
- Download URL: memento_cli-0.0.4.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.2 Darwin/23.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81b31f8df3f44ce449d83bb600435e34eb0376346cc62ed225c66c5d38d26bf0
|
|
| MD5 |
62f608405603b0c57f8de3e0facc4fe3
|
|
| BLAKE2b-256 |
22248680807a14cf66774b1301066ec261cb4190a3e3580139cec3e68449ef08
|
File details
Details for the file memento_cli-0.0.4-py3-none-any.whl.
File metadata
- Download URL: memento_cli-0.0.4-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.2 Darwin/23.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
adf7f2536c019832e4345a30d0ab469c39e401b4555bde5e1c84ccb6296a0eb0
|
|
| MD5 |
0d0dc58c03b3d06ad78d4ec868c11c38
|
|
| BLAKE2b-256 |
25a6d6bddc420cd7a808a7a03893d1ea38772bf79e3c02e639c27104cff91aaf
|