Skip to main content

Software Heritage search service

Project description

Search service for the Software Heritage archive.

It is similar to swh-storage in what it contains, but provides different ways to query it: while swh-storage is mostly a key-value store that returns an object from a primary key, swh-search is focused on reverse indices, to allow finding objects that match some criteria; for example full-text search.

Currently uses ElasticSearch, and provides only origin search (by URL and metadata).

Dependencies

  • Python tests for this module include tests that cannot be run without a local ElasticSearch instance, so you need the ElasticSearch server executable on your machine (no need to have a running ElasticSearch server).

    • Debian-like host

      The elasticsearch package is required. As it’s not part of debian-stable, another debian repository is required to be configured

    • Non Debian-like host

      The tests expect:

      • /usr/share/elasticsearch/jdk/bin/java to exist.

      • org.elasticsearch.bootstrap.Elasticsearch to be in java’s classpath.

  • Emscripten is required for generating tree-sitter WASM module. The following commands need to be executed for the setup:

    cd /opt && git clone https://github.com/emscripten-core/emsdk.git && cd emsdk && \
    ./emsdk install latest && ./emsdk activate latest
    PATH="${PATH}:/opt/emsdk/upstream/emscripten"

    Note: If emsdk isn’t found in the PATH, the tree-sitter cli automatically pulls emscripten/emsdk image from docker hub when make ts-build-wasm or make ts-build is used.

Make targets

Below is the list of available make targets that can be executed from the root directory of swh-search in order to build and/or execute the swh-search under various configurations:

  • ts-install: Install node_modules and emscripten SDK required for TreeSitter

  • ts-generate: Generate parser files(C and JSON) from the grammar

  • ts-repl: Starts a web based playground for the TreeSitter grammar. It’s the recommended way for developing TreeSitter grammar.

  • ts-dev: Parse the query_language/sample_query and print the corresponding syntax expression along with the start and end positions of all the nodes.

  • ts-dev sanitize=1: Same as ts-dev but without start and end position of the nodes. This format is expected by TreeSitter’s native test command. sanitize=1 cleans the output of ts-dev using sed to achieve the desired format.

  • ts-test: executes TreeSitter’s native tests

  • ts-build-so: Generates swh_ql.so file from the previously generated parser using py-tree-sitter

  • ts-build-so: Generates swh_ql.wasm file from the previously generated parser using emscripten

  • ts-build: Executes both ts-build-so and ts-build-so

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh.search-0.16.6.tar.gz (84.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swh.search-0.16.6-py3-none-any.whl (90.4 kB view details)

Uploaded Python 3

File details

Details for the file swh.search-0.16.6.tar.gz.

File metadata

  • Download URL: swh.search-0.16.6.tar.gz
  • Upload date:
  • Size: 84.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for swh.search-0.16.6.tar.gz
Algorithm Hash digest
SHA256 6425a7f17b3afd8ee2c5d61d488386963316ee9f81df267e104d879104d2b6c7
MD5 b2583d5cb8a833b7ab15621b0e255bd0
BLAKE2b-256 98e15e1b21fd607d5cd7e4a79e1911bf93459abac0e6de2238245cd4cf21c172

See more details on using hashes here.

File details

Details for the file swh.search-0.16.6-py3-none-any.whl.

File metadata

  • Download URL: swh.search-0.16.6-py3-none-any.whl
  • Upload date:
  • Size: 90.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for swh.search-0.16.6-py3-none-any.whl
Algorithm Hash digest
SHA256 433d447e73376c19956c1dcdb807a097a4b38882c425755f894c00fc46196ae0
MD5 eda772b47070a849e35d4a30f47a146e
BLAKE2b-256 03ead7c84ca136f48560b65ed7e348cf6b3e7aaa8ea0eea1bcadecea648399ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page