Skip to main content

# FromConfig Yarn

Project description

FromConfig Yarn

pypi ci

A fromconfig Launcher for yarn execution.

Install

pip install fromconfig_yarn

Quickstart

Once installed, the launcher is available with the name yarn.

Given the following module

class Model:
    def __init__(self, learning_rate: float):
        self.learning_rate = learning_rate

    def train(self):
        print(f"Training model with learning_rate {self.learning_rate}")

and config files

# config.yaml
model:
  _attr_: foo.Model
  learning_rate: "${params.learning_rate}"

# params.yaml
params:
  learning_rate: 0.001

# launcher.yaml
yarn:
  name: test-fromconfig

logging:
  level: 20

launcher:
  run: yarn

Run (assuming you are in a Hadoop environment)

fromconfig config.yaml params.yaml launcher.yaml - model - train

Which prints

INFO:fromconfig.launcher.logger:- yarn.name: test-fromconfig
INFO:fromconfig.launcher.logger:- logging.level: 20
INFO:fromconfig.launcher.logger:- params.learning_rate: 0.001
INFO:fromconfig.launcher.logger:- model._attr_: foo.Model
INFO:fromconfig.launcher.logger:- model.learning_rate: 0.001
INFO skein.Driver: Driver started, listening on 12345
INFO:fromconfig_yarn.launcher:Uploading pex to viewfs://root/user/path/to/pex
INFO:cluster_pack.filesystem:Resolved base filesystem: <class 'pyarrow.hdfs.HadoopFileSystem'>
INFO:cluster_pack.uploader:Zipping and uploading your env to viewfs://root/user/path/to/pex
INFO skein.Driver: Uploading application resources to viewfs://root/user/...
INFO skein.Driver: Submitting application...
INFO impl.YarnClientImpl: Submitted application application_12345
INFO:fromconfig_yarn.launcher:TRACKING_URL: http://12.34.56/application_12345

You can also monkeypatch the relevant functions to "fake" the Hadoop environment with

python monkeypatch_fromconfig.py config.yaml params.yaml launcher.yaml - model - train

This example can be found in docs/examples/quickstart.

Usage Reference

Options

To configure Yarn, add a yarn entry to your config.

You can set the following parameters.

  • env_vars: A list of environment variables to forward to the container(s)
  • hadoop_file_systems: The list of available filesystems
  • ignored_packages: The list of packages not to include in the environment
  • jvm_memory_in_gb: The JVM memory (default, 8)
  • memory: The executor's memory (default, 32 GiB)
  • num_cores: The executor's number of cores (default, 8)
  • package_path: The HDFS location where to save the environment
  • zip_file: The path to an existing pex file, either local or on HDFS
  • name: The application name
  • queue: The yarn queue to submit the application to
  • node_label: The label of the hadoop node to be scheduled
  • pre_script_hook: A script to be executed before python is invoked
  • extra_env_vars: A mapping of extra environment variables to forward to the container(s)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fromconfig_yarn-0.1.3.tar.gz (9.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page