Skip to main content

Use dask to run the DVC graph

Project description

Coverage Status PyTest PyPI version

Dask4DVC - Distributed Node Exectuion

DVC provides tools for building and executing the computational graph locally through various methods. The dask4dvc package combines Dask Distributed with DVC to make it easier to use with HPC managers like Slurm.

Usage

Dask4DVC provides a CLI similar to DVC.

  • dvc repro becomes dask4dvc repro.
  • dvc exp run --run-all becomes dask4dvc run.

SLURM Cluster

You can use dask4dvc easily with a slurm cluster. This requires a running dask scheduler:

from dask_jobqueue import SLURMCluster

cluster = SLURMCluster(
    cores=1, memory='128GB',
    queue="gpu",
    processes=1,
    walltime='8:00:00',
    job_cpu=1,
    job_extra=['-N 1', '--cpus-per-task=1', '--tasks-per-node=64', "--gres=gpu:1"],
    scheduler_options={"port": 31415}
)
cluster.adapt()

with this setup you can then run dask4dvc repro --address 127.0.0.1:31415 on the example port 31415.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask4dvc-0.1.2.tar.gz (13.7 kB view hashes)

Uploaded Source

Built Distribution

dask4dvc-0.1.2-py3-none-any.whl (15.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page