Skip to main content

Software Heritage PyPI Loader

Project description

swh-loader-pypi

SWH PyPI loader's source code repository

What does the loader do?

The PyPI loader visits and loads a PyPI project [1].

Each visit will result in:

  • 1 snapshot (which targets n revisions ; 1 per release artifact)
  • 1 revision (which targets 1 directory ; the release artifact uncompressed)

[1] https://pypi.org/help/#packages

First visit

Given a PyPI project (origin), the loader, for the first visit:

  • retrieves information for the given project (including releases)
  • then for each associated release
  • for each associated source distribution (type 'sdist') release artifact (possibly many per release)
  • retrieves the associated artifact archive (with checks)
  • uncompresses locally the archive
  • computes the hashes of the uncompressed directory
  • then creates a revision (using PKG-INFO metadata file) targeting such directory
  • finally, creates a snapshot targeting all seen revisions (uncompressed PyPI artifact and metadata).

Next visit

The loader starts by checking if something changed since the last visit. If nothing changed, the visit's snapshot is left unchanged. The new visit targets the same snapshot.

If something changed, the already seen release artifacts are skipped. Only the new ones are loaded. In the end, the loader creates a new snapshot based on the previous one. Thus, the new snapshot targets both the old and new PyPI release artifacts.

Terminology

  • 1 project: a PyPI project (used as swh origin). This is a collection of releases.

  • 1 release: a specific version of the (PyPi) project. It's a collection of information and associated source release artifacts (type 'sdist')

  • 1 release artifact: a source release artifact (distributed by a PyPI maintainer). In swh, we are specifically interested by the 'sdist' type (source code).

Edge cases

  • If no release provides release artifacts, those are skipped

  • If a release artifact holds no PKG-INFO file (root at the archive), the release artifact is skipped.

  • If a problem occurs during a fetch action (e.g. release artifact download), the load fails and the visit is marked as 'partial'.

Development

Configuration file

Location

Either:

  • /etc/softwareheritage/
  • ~/.config/swh/
  • ~/.swh/

Note: Will call that location $SWH_CONFIG_PATH

Configuration sample

$SWH_CONFIG_PATH/loader/pypi.yml:

storage:
  cls: remote
  args:
    url: http://localhost:5002/

Local run

The built-in command-line will run the loader for a project in the main PyPI archive.

For instance, to load arrow:

python3 -m swh.loader.pypi.loader arrow

If you need more control, you can use the loader directly. It expects three arguments:

  • project: a PyPI project name (f.e.: arrow)
  • project_url: URL of the PyPI project (human-readable html page)
  • project_metadata_url: URL of the PyPI metadata information (machine-parsable json document)
import logging
logging.basicConfig(level=logging.DEBUG)

from swh.loader.pypi.tasks import LoadPyPI

project='arrow'

LoadPyPI().run(project, 'https://pypi.org/pypi/%s/' % project, 'https://pypi.org/pypi/%s/json' % project)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swh.loader.pypi-0.0.9.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swh.loader.pypi-0.0.9-py3-none-any.whl (33.1 kB view details)

Uploaded Python 3

File details

Details for the file swh.loader.pypi-0.0.9.tar.gz.

File metadata

  • Download URL: swh.loader.pypi-0.0.9.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.3

File hashes

Hashes for swh.loader.pypi-0.0.9.tar.gz
Algorithm Hash digest
SHA256 4c6e71fb287fddc92bf59b1698dad1b402ca5ac11c30507527e9252883265747
MD5 6ebaba24d1418ab9501308ad8a859817
BLAKE2b-256 ba1feb3fccc001ab581fe0edb3540a687a253051dbba7cb9f200cfb72eea1b91

See more details on using hashes here.

File details

Details for the file swh.loader.pypi-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: swh.loader.pypi-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 33.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.3

File hashes

Hashes for swh.loader.pypi-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f750f3ce75de9d340d5c57cad16db06db47e8f8745f5f7cd381dd3a8df0dd989
MD5 534e162fe9de1a1f64c8d9142b5fd34e
BLAKE2b-256 a6f166f5c5dd8dc5be1377deaf26883546ad3a404742e5cdb5c9345d9b11bd1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page