Skip to main content

Software Heritage Content Indexer

Project description

swh-indexer

Tools to compute multiple indexes on SWH's raw contents:

  • content:
    • mimetype
    • ctags
    • language
    • fossology-license
    • metadata
  • revision:
    • metadata

An indexer is in charge of:

  • looking up objects
  • extracting information from those objects
  • store those information in the swh-indexer db

There are multiple indexers working on different object types:

  • content indexer: works with content sha1 hashes
  • revision indexer: works with revision sha1 hashes
  • origin indexer: works with origin identifiers

Indexation procedure:

  • receive batch of ids
  • retrieve the associated data depending on object type
  • compute for that object some index
  • store the result to swh's storage

Current content indexers:

  • mimetype (queue swh_indexer_content_mimetype): detect the encoding and mimetype

  • language (queue swh_indexer_content_language): detect the programming language

  • ctags (queue swh_indexer_content_ctags): compute tags information

  • fossology-license (queue swh_indexer_fossology_license): compute the license

  • metadata: translate file into translated_metadata dict

Current revision indexers:

  • metadata: detects files containing metadata and retrieves translated_metadata in content_metadata table in storage or run content indexer to translate files.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh.indexer-2.4.1.tar.gz (143.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swh.indexer-2.4.1-py3-none-any.whl (189.1 kB view details)

Uploaded Python 3

File details

Details for the file swh.indexer-2.4.1.tar.gz.

File metadata

  • Download URL: swh.indexer-2.4.1.tar.gz
  • Upload date:
  • Size: 143.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.3

File hashes

Hashes for swh.indexer-2.4.1.tar.gz
Algorithm Hash digest
SHA256 b1637028169ff6d93f29b59ed80148c28c31c0adff9c58a8143d9db3243312d0
MD5 914bb3e72eb07a45c575b6dc8faae056
BLAKE2b-256 b59d355a9f1cf74b953b4c82f78aeb7f66269df0e541657d1bd9af3ff6f882c1

See more details on using hashes here.

File details

Details for the file swh.indexer-2.4.1-py3-none-any.whl.

File metadata

  • Download URL: swh.indexer-2.4.1-py3-none-any.whl
  • Upload date:
  • Size: 189.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.3

File hashes

Hashes for swh.indexer-2.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b735c7c48a048c93d2308f4cd98c12c3d158f0b64ddc457df8fa95fb04fe6cc5
MD5 7260d706dd23913ed451d144e9ea6d26
BLAKE2b-256 67d18e8186f95b1e143a4d1e5f39a9084f562968b60f56f60060ac2a396f9633

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page