Skip to main content

Python tool for archiving web pages through Internet Archive Wayback Machine

Project description

PRs Welcome Conventional Commits Code style: black Github Actions PyPI Package latest release PyPI Package download count (per month) Supported versions

Wayback Machine Saver

Python tool for archiving web pages through Internet Archive Wayback Machine

Getting Started

Prerequisites

Installation

It's recommended to use tools like pipx to install this command-line tool.

pipx install wayback-machine-saver

Usage

Save pages

Save URLs from the input file to Internet Archive - Wayback Machine

wayback_machine_saver save-pages FILENAME

Argument

  • FILENAME: filename to the file that consists of URLs to save

e.g.,

https://example.com
https://another-example.com

options

  • --deliminator TEXT [default: "\n"]
  • --error-log-filename TEXT [default: save-pages-error-log-"timestamp".csv]

Get latest archive urls

After the URLs have been saved, Internet Archive - Wayback Machine will snap-shot the page to their database and create a timestamp. You can access the latest one through http://web.archive.org/web/[Your URL] and it will be redirected to http://web.archive.org/web/[timestamp]/[Your URL]. This command is used to get the redirected URLs.

wayback_machine_saver get-latest-archive-urls FILENAME

Argument

  • FILENAME: filename to the file that consists of URLs to retrieved

e.g.,

https://example.com
https://another-example.com

options

  • --deliminator TEXT [default: "\n"]
  • --output-filename TEXT [default: retrieved-urls-"timestamp".csv]]
  • --error-log-filename TEXT [default: get-url-error-log-"timestamp".csv]

Configuration

Wayback Machine Saves supports configurating through environment variable. You can run export VARIABLE=VALUE before running the script to change the behavior.

  • WAYBACK_MACHINE_SAVER_RETRY_TIMES
    • times to retry (default: 3)
  • HTTPX_TIMEOUT
    • timeout for all GET operations (default: 10)

Contributing

See Contributing

Authors

Wei Lee weilee.rx@gmail.com

Created from Lee-W/cookiecutter-python-template version 0.9.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wayback_machine_saver-0.3.1.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

wayback_machine_saver-0.3.1-py3-none-any.whl (6.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page