Skip to main content

A simple light-weight extendable command line tool for managing jobs on DIAG's SOL cluster.

Project description

Solitude

A simple light-weight command line tool for managing jobs on the SOL cluster.

Features

  • Quering status of a specified list of slurm jobs and presenting them in a nice overview
  • Tools to managing the specified jobs (starting/stopping/extending)
  • Cross platform due to using ssh (paramiko) for querying and issuing commands

Setup and configuration

  1. Install trough pip using: $ pip install solitude
  2. Configure the tool through: $ solitude config create and fill out the prompts.
  3. Previous step should have generated a configuration file at the proper location (installation directory or the user's home directory). It should contain a target cluster machine and the login credentials, which will be used to query and issue commands. It's contents and whereabouts can be queried using solitude config status and should contain something like:
{
    "defaults": {
        "user": "username",
        "workers": 8
    }, 
    "ssh":{
        "server" : "dlc-machine.umcn.nl",
        "username" : "user",
        "password" : "*******"
    },
    "plugins":[    
    ]
}

Now the tool is ready for usage. See below for examples...

Example usage

Create a file for your deep learning project with a list of jobs (here we call this commands.list) using the following format:

# Test jobs 
# (commented lines and empty lines will be ignored)

./c-submit --require-mem=1g --require-cpus=1 --gpu-count=0 {user} test 1 hello-world
./c-submit --require-mem=1g --require-cpus=1 --gpu-count=0 {user} test 1 
./c-submit --require-mem=1g --require-cpus=1 --gpu-count=0 {user} test 1 hello-world

This format supports the special tag {user} which will be substituted with the default user name.

After creating this use the following command to list the commands:

$ solitude list -f /path/to/commands.list

Running specific jobs can be achieved with:

$ solitude run -f /path/to/commands.list -i 1-3 --priority=high

For stopping and extending running jobs you can use solitude stop and solitude extend commands respectively.

Plugins

The supported commands can be tweaked and extended by writing custom pluggy plugins. This can change the way commands are being treated, which information is retrieved etc. The pluggy documentation has some excellent detailed documentation on how to create and package your own plugins: https://pluggy.readthedocs.io/en/latest/

Here is a brief extract on how to do this for solitude.

First make a separate project folder and create the following files:

solitude-exampleplugin/solitude_exampleplugin.py

import solitude
from typing import Dict, List


@solitude.hookimpl
def matches_command(cmd: str) -> bool:
    return "custom command" in cmd


@solitude.hookimpl
def filter_command_essential(cmd: str) -> str:
    return cmd


@solitude.hookimpl
def retrieve_state(cmd: str) -> Dict:
    return {}


@solitude.hookimpl
def is_command_job_done(cmd: str, state: Dict) -> bool:
    return False


@solitude.hookimpl
def get_command_status_str(cmd: str, state: Dict) -> str:
    return cmd


@solitude.hookimpl
def get_errors_from_log(log: str) -> List[str]:
    errors = []
    return errors

solitude-exampleplugin/setup.py

from setuptools import setup

setup(
    name="solitude-exampleplugin",
    install_requires="solitude",
    entry_points={"solitude": ["exampleplugin = solitude_exampleplugin"]},
    py_modules=["solitude_exampleplugin"],
)

Now let's install the plugin and test it:

$ pip install --editable solitude-exampleplugin
$ solitude list -f your_test_commands.list 

Contributing

Fork the solitude repository

Setup your forked repository locally as an editable installation:

$ cd ~
$ git clone https://github.com/yourproject/solitude
$ pip install --editable solitude

Now you can work locally and create your own pull requests.

Maintainer

Sil van de Leemput

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

solitude-0.1.2.tar.gz (14.8 kB view hashes)

Uploaded Source

Built Distribution

solitude-0.1.2-py3-none-any.whl (20.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page