Skip to main content

A Tool to Summarize Web Archive Holdings

Project description

MementoMap

A framework of web archive profiling to express summary of the holdings of an archive.

$ pip install mementomap
$ mementomap
usage: mementomap [-h] {generate,compact,lookup,batchlookup} ...

positional arguments:
  {generate,compact,lookup,batchlookup}
    generate            Generate a MementoMap from a sorted file with the
                        first columns as SURT (e.g., CDX/CDXJ)
    compact             Compact a large MementoMap file into a small one
    lookup              Look for a SURT into a MementoMap
    batchlookup         Look for a list of SURTs into a MementoMap

optional arguments:
  -h, --help            show this help message and exit
$ mementomap generate -h
usage: mementomap generate [-h] [--hcf] [--pcf] [--ha] [--pa] [--hk] [--pk]
                        [--hdepth] [--pdepth]
                        infile outfile

positional arguments:
  infile      Input SURT/CDX/CDXJ (plain or GZip) file path or '-' for STDIN
  outfile     Output MementoMap file path

optional arguments:
  -h, --help  show this help message and exit
  --hcf       Host compaction factor (deafault: Inf)
  --pcf       Path compaction factor (deafault: Inf)
  --ha        Power law alpha parameter for host (default: 16.329)
  --pa        Power law alpha parameter for path (default: 24.546)
  --hk        Power law k parameter for host (default: 0.714)
  --pk        Power law k parameter for path (default: 1.429)
  --hdepth    Max host depth (default: 8)
  --pdepth    Max path depth (default: 9)
$ mementomap compact -h
usage: mementomap compact [-h] [--hcf] [--pcf] [--ha] [--pa] [--hk] [--pk]
                       [--hdepth] [--pdepth]
                       infile outfile

positional arguments:
  infile      Input MementoMap (plain or GZip) file path or '-' for STDIN
  outfile     Output MementoMap file path

optional arguments:
  -h, --help  show this help message and exit
  --hcf       Host compaction factor (deafault: 1.0)
  --pcf       Path compaction factor (deafault: 1.0)
  --ha        Power law alpha parameter for host (default: 16.329)
  --pa        Power law alpha parameter for path (default: 24.546)
  --hk        Power law k parameter for host (default: 0.714)
  --pk        Power law k parameter for path (default: 1.429)
  --hdepth    Max host depth (default: 8)
  --pdepth    Max path depth (default: 9)
$ mementomap lookup -h
usage: mementomap lookup [-h] mmap surt

positional arguments:
  mmap        MementoMap file path to look into
  surt        SURT to look for

optional arguments:
  -h, --help  show this help message and exit
$ mementomap batchlookup -h
usage: mementomap batchlookup [-h] mmap infile

positional arguments:
  mmap        MementoMap file path to look into
  infile      Input SURT (plain or GZip) file path or '-' for STDIN

optional arguments:
  -h, --help  show this help message and exit

Citing Project

A publication related to this project appeared in the proceedings of JCDL 2019 (Read the PDF). Please cite it as below:

Sawood Alam, Michele C. Weigle, Michael L. Nelson, Fernando Melo, Daniel Bicho, Daniel Gomes. MementoMap Framework for Flexible and Adaptive Web Archive Profiling. In Proceedings of the 19th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2019, pp. 172-181, Urbana-Champaign, Illinois, USA, June 2016.

@inproceedings{jcdl-2019:alam:mementomap,
  author    = {Sawood Alam and
               Michele C. Weigle and
               Michael L. Nelson and
               Fernando Melo and
               Daniel Bicho and
               Daniel Gomes},
  title     = {{MementoMap} Framework for Flexible and Adaptive Web Archive Profiling},
  booktitle = {Proceedings of the 19th {ACM/IEEE-CS} Joint Conference on Digital Libraries},
  series    = {JCDL '19},
  year      = {2019},
  month     = {jun},
  location  = {Urbana-Champaign, Illinois, USA},
  pages     = {172--181},
  numpages  = {10},
  url       = {https://doi.org/10.1109/JCDL.2019.00033},
  doi       = {10.1109/JCDL.2019.00033},
  isbn      = {978-1-7281-1547-4},
  publisher = {{IEEE}}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mementomap-0.1.0b2.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mementomap-0.1.0b2-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file mementomap-0.1.0b2.tar.gz.

File metadata

  • Download URL: mementomap-0.1.0b2.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for mementomap-0.1.0b2.tar.gz
Algorithm Hash digest
SHA256 e3e6f4e8801d1949d5424c6fc0f633c7a4e6444612ee8d3ca43d9276a9b5dd59
MD5 555492fdfec6b886faf908a19df53fc9
BLAKE2b-256 c34594fbdac5b05cdc97e60df4499e3b8645869e068099fe10dd230c779350e3

See more details on using hashes here.

File details

Details for the file mementomap-0.1.0b2-py3-none-any.whl.

File metadata

  • Download URL: mementomap-0.1.0b2-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for mementomap-0.1.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 20fecc48d5a23576d36905d761d1c6723d1498ac02fd5e48ee3d56fa19485a85
MD5 10fd759c410bedf44151eba18c8fc332
BLAKE2b-256 b1bb63ce75789b3fc688dde5ae7cbfc9efad348100178b636f0bf1e5a000b73a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page