Skip to main content

A Tool to Summarize Web Archive Holdings

Project description

MementoMap

A framework of web archive profiling to express holdings of an archive

$ ./main.py
usage: main.py [-h] {generate,compact,lookup,batchlookup} ...

positional arguments:
  {generate,compact,lookup,batchlookup}
    generate            Generate a MementoMap from a sorted file with the
                        first columns as SURT (e.g., CDX/CDXJ)
    compact             Compact a large MementoMap file into a small one
    lookup              Look for a SURT into a MementoMap
    batchlookup         Look for a list of SURTs into a MementoMap

optional arguments:
  -h, --help            show this help message and exit
$ ./main.py generate -h
usage: main.py generate [-h] [--hcf] [--pcf] [--ha] [--pa] [--hk] [--pk]
                        [--hdepth] [--pdepth]
                        infile outfile

positional arguments:
  infile      Input SURT/CDX/CDXJ (plain or GZip) file path or '-' for STDIN
  outfile     Output MementoMap file path

optional arguments:
  -h, --help  show this help message and exit
  --hcf       Host compaction factor (deafault: Inf)
  --pcf       Path compaction factor (deafault: Inf)
  --ha        Power law alpha parameter for host (default: 16.329)
  --pa        Power law alpha parameter for path (default: 24.546)
  --hk        Power law k parameter for host (default: 0.714)
  --pk        Power law k parameter for path (default: 1.429)
  --hdepth    Max host depth (default: 8)
  --pdepth    Max path depth (default: 9)
$ ./main.py compact -h
usage: main.py compact [-h] [--hcf] [--pcf] [--ha] [--pa] [--hk] [--pk]
                       [--hdepth] [--pdepth]
                       infile outfile

positional arguments:
  infile      Input MementoMap (plain or GZip) file path or '-' for STDIN
  outfile     Output MementoMap file path

optional arguments:
  -h, --help  show this help message and exit
  --hcf       Host compaction factor (deafault: 1.0)
  --pcf       Path compaction factor (deafault: 1.0)
  --ha        Power law alpha parameter for host (default: 16.329)
  --pa        Power law alpha parameter for path (default: 24.546)
  --hk        Power law k parameter for host (default: 0.714)
  --pk        Power law k parameter for path (default: 1.429)
  --hdepth    Max host depth (default: 8)
  --pdepth    Max path depth (default: 9)
$ ./main.py lookup -h
usage: main.py lookup [-h] mmap surt

positional arguments:
  mmap        MementoMap file path to look into
  surt        SURT to look for

optional arguments:
  -h, --help  show this help message and exit
$ ./main.py batchlookup -h
usage: main.py batchlookup [-h] mmap infile

positional arguments:
  mmap        MementoMap file path to look into
  infile      Input SURT (plain or GZip) file path or '-' for STDIN

optional arguments:
  -h, --help  show this help message and exit

Citing Project

A publication related to this project appeared in the proceedings of JCDL 2019 (Read the PDF). Please cite it as below:

Sawood Alam, Michele C. Weigle, Michael L. Nelson, Fernando Melo, Daniel Bicho, Daniel Gomes. MementoMap Framework for Flexible and Adaptive Web Archive Profiling. In Proceedings of the 19th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2019, pp. 172-181, Urbana-Champaign, Illinois, USA, June 2016.

@inproceedings{jcdl-2019:alam:mementomap,
  author    = {Sawood Alam and
               Michele C. Weigle and
               Michael L. Nelson and
               Fernando Melo and
               Daniel Bicho and
               Daniel Gomes},
  title     = {{MementoMap} Framework for Flexible and Adaptive Web Archive Profiling},
  booktitle = {Proceedings of the 19th {ACM/IEEE-CS} Joint Conference on Digital Libraries},
  series    = {JCDL '19},
  year      = {2019},
  month     = {jun},
  location  = {Urbana-Champaign, Illinois, USA},
  pages     = {172--181},
  numpages  = {10},
  url       = {https://doi.org/10.1109/JCDL.2019.00033},
  doi       = {10.1109/JCDL.2019.00033},
  isbn      = {978-1-7281-1547-4},
  publisher = {{IEEE}}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mementomap-0.1.0b1.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mementomap-0.1.0b1-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file mementomap-0.1.0b1.tar.gz.

File metadata

  • Download URL: mementomap-0.1.0b1.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for mementomap-0.1.0b1.tar.gz
Algorithm Hash digest
SHA256 d04a34c8c2a0fabedb6fa7e20472f23efbed7f2f4ec93b176dd8c32d11e56f3e
MD5 716192f8bdf550cae08b65dec4c57fa4
BLAKE2b-256 a5525aaa1e8d90cb0073fcb83e4d8fe5e230882b955d2362fc63ab484d997575

See more details on using hashes here.

File details

Details for the file mementomap-0.1.0b1-py3-none-any.whl.

File metadata

  • Download URL: mementomap-0.1.0b1-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for mementomap-0.1.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 22e3eba9dfd7b53b648fdb32776fb6cf1c6dab4057c3e4d424b4d52132d2652c
MD5 090ff6305b5eb8dc1aba697df74f413c
BLAKE2b-256 80b56a6462a752672bb9303a882235b7a863c7741e51a91cf6bf56751b5ab877

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page