Skip to main content

# FromConfig

Project description

FromConfig

pypi ci

A library to instantiate any Python object from configuration files.

Thanks to Python Fire, fromconfig acts as a generic command line interface from configuration files with absolutely no change to the code.

FromConfig

Table Of Content

Install

pip install fromconfig

Quickstart

fromconfig can configure any Python object, without any change to the code.

As an example, let's consider a foo.py module

class Model:
    def __init__(self, learning_rate: float):
        self.learning_rate = learning_rate

    def train(self):
        print(f"Training model with learning_rate {self.learning_rate}")

with the following config files

  • config.yaml
model:
  _attr_: foo.Model
  learning_rate: "@params.learning_rate"
  • params.yaml
params:
  learning_rate: 0.1

In a terminal, run

fromconfig config.yaml params.yaml - model - train

which prints

Training model with learning_rate 0.1

Here is a step-by-step breakdown of what is happening

  1. Load the yaml files into dictionaries
  2. Merge the dictionaries
  3. After parsing the resulting dictionary with a default parser (resolving references as @params.learning_rate, etc.), it recursively instantiate sub-dictionaries, using the _attr_ key to resolve the Python class / function as an import string.
  4. Finally, the - model - train part of the command is a Python Fire syntax, which translates into "get the model key from the instantiated dictionary and execute the train method".

This example can be found in docs/examples/quickstart.

To learn more about FromConfig features, see the Usage Reference and Examples sections.

Cheat Sheet

fromconfig.fromconfig special keys

Key Value Example Use
"_attr_" "foo.bar.MyClass" Full import string of a class, function or method
"_args_" [1, 2] Positional arguments

fromconfig.parser.DefaultParser syntax

Key Value Use
"_singleton_" "my_singleton_name" Creates a singleton identified by name
"_eval_" "call", "import", "partial" Evaluation modes
"@params.model" Reference
"${params.url}:${params.port} Interpolation via OmegaConf

Why FromConfig ?

fromconfig enables the instantiation of arbitrary trees of Python objects from config files.

It echoes the FromParams base class of AllenNLP.

It is particularly well suited for Machine Learning (see examples). Launching training jobs on remote clusters requires custom command lines, with arguments that need to be propagated through the call stack (e.g., setting parameters of a particular layer). The usual way is to write a custom command with a reduced set of arguments, combined by an assembler that creates the different objects. With fromconfig, the command line becomes generic, and all the specifics are kept in config files. As a result, this preserves the code from any backwards dependency issues and allows full reproducibility by saving config files as jobs' artifacts. It also makes it easier to merge different sets of arguments in a dynamic way through references and interpolation.

fromconfig is based off the config system developed as part of the deepr library, a collections of utilities to define and train Tensorflow models in a Hadoop environment.

Other relevant libraries are:

  • fire automatically generate command line interface (CLIs) from absolutely any Python object.
  • omegaconf YAML based hierarchical configuration system with support for merging configurations from multiple sources.
  • hydra A higher-level framework based off omegaconf to configure complex applications.
  • gin A lightweight configuration framework based on dependency injection.
  • thinc A lightweight functional deep learning library that comes with an integrated config system

Usage Reference

The fromconfig library relies on two independent components.

  1. A lightweight syntax to instantiate any Python object from dictionaries (using special keys _attr_ and _args_).
  2. A composable, flexible, and customizable framework to parse configs before instantiation. This allows configs to remain short and readable with syntactic sugar to define singletons, references, etc.

Command Line

Usage : call fromconfig on any number of paths to config files.

fromconfig config.yaml params.yaml

Supported formats : YAML, JSON, and JSONNET.

The command line loads the different config files into Python dictionaries and merge them (it fails in case of any key conflict). It parses the resulting dictionary with the DefaultParser before calling fromconfig.fromconfig to instantiate the object.

As the fromconfig command is wrapped in a Python Fire call, you can manipulate the resulting instantiated dictionary via the command line by using the fire syntax.

For example fromconfig config.yaml - name instantiates the dictionary defined in config.yaml and gets the value associated with the key name.

Config syntax

The fromconfig.fromconfig function recursively instantiates objects from dictionaries.

It uses two special keys

  • _attr_: (optional) full import string to any Python object.
  • _args_: (optional) positional arguments.

For example

import fromconfig

config = {"_attr_": "str", "_args_": [1]}

fromconfig.fromconfig(config)  # '1'

FromConfig resolves the builtin type str from the _attr_ key, and creates a new string with the positional arguments defined in _args_, in other words str(1) which return '1'.

If the _attr_ key is not given, then the dictionary is left as a dictionary (the values of the dictionary may be recursively instantiated).

If other keys are available in the dictionary, they are treated as key-value arguments (kwargs).

For example

import fromconfig


class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y


config = {
    "_attr_": "Point",
    "x": 0,
    "y": 0
}
fromconfig.fromconfig(config)  # Point(0, 0)

Note that during instantiation, the config object is not modified. Also, any mapping-like container is supported (there is no special "config" class in fromconfig).

Parsing

Default

FromConfig comes with a default parser which sequentially applies

  • OmegaConfParser: can be practical for interpolation
  • ReferenceParser: resolves references
  • EvaluateParser: syntactic sugar to configure functool.partial or simple imports
  • SingletonParser: syntactic sugar to define singletons

For example, let's see how to create singletons, use references and interpolation

import fromconfig


class Model:
    def __init__(self, model_dir):
        self.model_dir = model_dir


class Trainer:
    def __init__(self, model):
        self.model = model


config = {
    "model": {
        "_attr_": "Model",
        "_singleton_": "my_model",  # singleton
        "model_dir": "${data.root}/${data.model}"  # interpolation
    },
    "data": {
        "root": "/path/to/root",
        "model": "subdir/for/model"
    },
    "trainer": {
        "_attr_": "Trainer",
        "model": "@model",  # reference
    }
}
parser = fromconfig.parser.DefaultParser()
parsed = parser(config)
instance = fromconfig.fromconfig(parsed)
id(instance["model"]) == id(instance["trainer"].model)  # True
instance["model"].model_dir == "/path/to/root/subdir/for/model"  # True

OmegaConf

OmegaConf is a YAML based hierarchical configuration system with support for merging configurations from multiple sources. The OmegaConfParser wraps some of its functionality (for example, variable interpolation).

For example

import fromconfig
config = {
    "host": "localhost",
    "port": "8008",
    "url": "${host}:${port}"
}
parser = fromconfig.parser.OmegaConfParser()
parsed = parser(config)
parsed["url"]  # 'localhost:8008'

Learn more on the OmegaConf documentation website.

References

To make it easy to compose different configuration files and avoid deeply nested config dictionaries, you can use the ReferenceParser.

For example,

import fromconfig
parser = fromconfig.parser.ReferenceParser()
config = {"params": {"x": 1}, "y": "@params.x"}
parsed = parser(config)
parsed["y"]

The ReferenceParser looks for values starting with a @, then split by ., and navigate from the top-level dictionary.

In practice, it makes configuration files more readable (flat) and avoids duplicates.

It is also a convenient way to dynamically compose different configs.

For example

import fromconfig

param1 = {
    "params": {
        "x": 1
    }
}
param2 = {
    "params": {
        "x": 2
    }
}
config = {
    "model": {
        "x": "@params.x"
    }
}
parser = fromconfig.parser.ReferenceParser()
parsed1 = parser({**config, **param1})
parsed1["model"]["x"]  # 1
parsed2 = parser({**config, **param2})
parsed1["model"]["x"]  # 2

Evaluate

The EvaluateParser makes it possible to simply import a class / function, or configure a constructor via a functools.partial call.

The parser uses a special key _eval_ with possible values

  • call: standard behavior, results in attr(kwargs).
  • partial: delays the call, results in a functools.partial(attr, **kwargs)
  • import: simply import the attribute, results in attr

call

import fromconfig

config = {"_attr_": "str", "_eval_": "call", "_args_": ["hello world"]}
parser = fromconfig.parser.EvaluateParser()
parsed = parser(config)
fromconfig.fromconfig(parsed) == "hello world"  # True

partial

import fromconfig

config = {"_attr_": "str", "_eval_": "partial", "_args_": ["hello world"]}
parser = fromconfig.parser.EvaluateParser()
parsed = parser(config)
fn = fromconfig.fromconfig(parsed)
isinstance(fn, functools.partial)  # True
fn() == "hello world"  # True

import

import fromconfig

config = {"_attr_": "str", "_eval_": "import"}
parser = fromconfig.parser.EvaluateParser()
parsed = parser(config)
fromconfig.fromconfig(parsed) is str  # True

Singleton

To define singletons (typically an object used in multiple places), use the SingletonParser.

For example,

import fromconfig


config = {
    "x": {
        "_attr_": "dict",
        "_singleton_": "my_dict",
        "x": 1
    },
    "y": {
        "_attr_": "dict",
        "_singleton_": "my_dict",
        "x": 1
    }
}
parser = fromconfig.parser.SingletonParser()
parsed = parser(config)
instance = fromconfig.fromconfig(parsed)
id(instance["x"]) == id(instance["y"])

Without the _singleton_ entry, two different dictionaries would have been created.

Note that using references is not a solution to create singletons, as the reference mechanism only copies missing parts of the configs.

The parser uses the special key _singleton_ whose value is the name associated with the instance to resolve singletons at instantiation time.

Examples

Manual

It is possible to manipulate configs directly in the code without using the fromconfig CLI.

For example,

"""Manual Example."""

import fromconfig


class Model:
    def __init__(self, learning_rate: float):
        self.learning_rate = learning_rate

    def train(self):
        print(f"Training model with learning_rate {self.learning_rate}")


if __name__ == "__main__":
    # Create config dictionary
    config = {
        "model": {"_attr_": "Model", "learning_rate": "@params.learning_rate"},
        "params": {
            "learning_rate": 0.1
        }
    }

    # Parse config (replace "@params.learning_rate" by its value)
    parser = fromconfig.parser.DefaultParser()
    parsed = parser(config)

    # Instantiate model and call train()
    model = fromconfig.fromconfig(parsed["model"])
    model.train()

This example can be found in docs/examples/manual

Custom Parser

One of fromconfig's strength is its flexibility when it comes to the config syntax.

To reduce the config boilerplate, it is possible to add a new Parser to support a new syntax.

Let's cover a dummy example : let's say we want to replace all empty strings with "lorem ipsum".

from typing import Dict

import fromconfig


class LoremIpsumParser(fromconfig.parser.Parser):
    """Custom Parser that replaces empty string by a default string."""

    def __init__(self, default: str = "lorem ipsum"):
        self.default = default

    def __call__(self, config: Dict):

        def _map_fn(value):
            if isinstance(value, str) and not value:
                return self.default
            return value

        # Utility to apply a function to all nodes of a nested dict
        # in a depth-first search
        return fromconfig.utils.depth_map(_map_fn, config)


cfg = {
    "x": "Hello World",
    "y": ""
}
parser = LoremIpsumParser()
parsed = parser(cfg)
print(parsed)  # {"x": "Hello World", "y": "lorem ipsum"}

This example can be found in docs/examples/custom_parser

Custom FromConfig

The logic to instantiate objects from config dictionaries is always the same.

It resolves the class, function or method attr from the _attr_ key, recursively call fromconfig on all the other key-values to get a kwargs dictionary of objects, and call attr(**kwargs).

It is possible to customize the behavior of fromconfig by inheriting the FromConfig class.

For example

import fromconfig


class MyClass(fromconfig.FromConfig):

    def __init__(self, x):
        self.x = x

    @classmethod
    def fromconfig(cls, config):
        if "x" not in config:
            return cls(0)
        else:
            return cls(**config)


config = {}
got = MyClass.fromconfig(config)
isinstance(got, MyClass)  # True
got.x  # 0

One custom FromConfig class is provided in fromconfig which makes it possible to stop the instantiation and keep config dictionaries as config dictionaries.

For example

import fromconfig


config = {
    "_attr_": "fromconfig.Config",
    "_config_": {
        "_attr_": "list"
    }
}
fromconfig.fromconfig(config)  # {'_attr_': 'list'}

Machine Learning

fromconfig is particularly well suited for Machine Learning as it is common to have a lot of different parameters, sometimes far down the call stack, and different configurations of these hyper-parameters.

Given a module ml.py defining model, optimizer and trainer classes

from dataclasses import dataclass


@dataclass
class Model:
    """Dummy Model class."""

    dim: int


@dataclass
class Optimizer:
    """Dummy Optimizer class."""

    learning_rate: float


class Trainer:
    """Dummy Trainer class."""

    def __init__(self, model, optimizer):
        self.model = model
        self.optimizer = optimizer

    def run(self):
        print(f"Training {self.model} with {self.optimizer}")

And the following config files

  • trainer.yaml: configures the training pipeline
trainer:
    _attr_: "training.Trainer"
    model: "@model"
    optimizer: "@optimizer"
  • model.yaml: configures the model
model:
    _attr_: "training.Model"
    dim: "@params.dim"
  • optimizer.yaml: configures the optimizer
optimizer:
    _attr_: "training.Optimizer"
    learning_rate: @params.learning_rate
  • params/small.yaml: hyper-parameters for a small version of the model
params:
    dim: 10
    learning_rate: 0.01
  • params/big.yaml: hyper-parameters for a big version of the model
params:
    dim: 100
    learning_rate: 0.001

It is possible to launch two different trainings with different set of hyper-parameters with

fromconfig trainer.yaml model.yaml optimizer.yaml params/small.yaml - trainer - run
fromconfig trainer.yaml model.yaml optimizer.yaml params/big.yaml - trainer - run

which should print

Training Model(dim=10) with Optimizer(learning_rate=0.01)
Training Model(dim=100) with Optimizer(learning_rate=0.001)

This example can be found in docs/examples/ml.

Note that it is encouraged to save these config files with the experiment's files to get full reproducibility. MlFlow is an open-source platform that tracks your experiments by logging metrics and artifacts.

Hyper-Parameter Search

To launch an hyper-parameter search, generate config files on the fly if using the fromconfig CLI, or config dictionaries.

For example,

import fromconfig


if __name__ == "__main__":
    config = {
        "model": {
            "_attr_": "ml.Model",
            "dim": "@params.dim"
        },
        "optimizer": {
            "_attr_": "ml.Optimizer",
            "learning_rate": "@params.learning_rate"
        },
        "trainer": {
            "_attr_": "ml.Trainer",
            "model": "@model",
            "optimizer": "@optimizer"
        }
    }
    parser = fromconfig.parser.DefaultParser()
    for dim in [10, 100]:
        for learning_rate in [0.01, 0.1]:
            params = {
                "dim": dim,
                "learning_rate": learning_rate
            }
            parsed = parser({**config, "params": params})
            trainer = fromconfig.fromconfig(parsed)["trainer"]
            trainer.run()

which prints

Training Model(dim=10) with Optimizer(learning_rate=0.01)
Training Model(dim=10) with Optimizer(learning_rate=0.1)
Training Model(dim=100) with Optimizer(learning_rate=0.01)
Training Model(dim=100) with Optimizer(learning_rate=0.1)

This example can be found in docs/examples/ml (run python hp.py).

Development

To install the library from source in editable mode

git clone https://github.com/criteo/fromconfig
cd fromconfig
make install

To install development tools

make install-dev

To lint the code (mypy, pylint and black)

make lint

To format the code with black

make black

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fromconfig-0.2.1.tar.gz (30.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page