API for graphs and tasks in Ewoks

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

EwoksCore: API for graphs and tasks in Ewoks

Workflow definition

A workflow is a directed graph of tasks. A directed graph consists of nodes and links.

A node describes an opaque unit of execution with a signature. It can have positional and named arguments which can be required or optional. It can have zero, one or more named outputs.

A task is an opaque unit of execution with input arguments defined by links and static values in the graph representation. (OOP analogy: a task is a node instance).

A link connects a source node to a target node. A link can have the following properties:

conditional: has a set of statements that combined are either True or False
required: either marked as “required” in the graph representation or “unconditional and all ancestors of the source node are required”
arguments: a mapping from input arguments of the target to output arguments of the source.

Task scheduling

A task can only be executed when all required predecessors have been executed successfully.

Task scheduling starts by executing all start tasks. When a graph has nodes without predecessors, those are the start tasks. Otherwise all nodes without required predecessors and with all required arguments statically defined are start nodes.

The input arguments of a task are defined in the following order of priority:

Input from non-required predecessors (we allow maximum one of those)
Input from all unconditional links (argument collisions raise an exception)
Input from the graph representation (static input)

Workflow description

Ewoks describes workflows as a list of nodes and a list of links

{
    "nodes": [{"id": "nodeid1", ...},
              {"id": "nodeid2", ...},
              ...],
    "links": [{"source": "nodeid1", "target": "nodeid2", ...},
              ...],
}

Each node and link can have the following attributes.

Graph attributes

nodes: a list of nodes
links: a list of links
name (optional): the name of the task graph

Node attributes

id: node identifier unique to the graph
Only one of these attributes to specify the unit of execution:
- class: the full qualifier name of a task class
- method: the full qualifier name of a function
- ppfmethod: the full qualifier name of a pypushflow function (special input/output convention)
- script: the full qualifier name of a python or shell script
- graph: the representation of another graph (e.g. json file name)
inputs (optional): static input arguments (for example {"a": 1})
inputs_complete (optional): set to True when the static input covers all required input (used for method and script as the required inputs are unknown)

Link attributes

source: the id of the source node
target: the id of the target node
arguments (optional): a dictionary that maps output names to input names. If the input name is None the output name receives the complete output of the source.
all_arguments (optional): setting this to True is equivalent to arguments being the identity mapping for all input names. Cannot be used in combination with arguments.
conditions (optional): a dictionary that maps output names to expected values
on_error (optional): a special condition: task raises an exception. Cannot be used in combination with conditions.
links: when source and/or target is a graph, this list of dictionaries specifies the links between the super-graph and the sub-graph. The dictionary keys are
- source: the id of the source node in the sub-graph (ignored when the source is not a graph)
- target: the id of the target node in the sub-graph (ignored when the source is not a graph)
- node_attributes (optional): overwrite the node attributes of the target when the target is a graph

Task implementation

All tasks can be described by deriving a class from the Task class.

required input names: an exception is raised when these inputs are not provided in the graph definition (output from previous tasks or static input values)
optional input names: no default values provided (need to be done in the process method)
output names: can be connected to downstream input names
required positional inputs: a positive number

For example

from ewokscore import Task

class SumTask(
    Task, input_names=["a"], optional_input_names=["b"], output_names=["result"]
):
    def run(self):
        result = self.inputs.a
        if self.inputs.b:
            result += self.inputs.b
        self.outputs.result = result

When a task is defined as a method or a script, a class wrapper will be generated automatically:

method: defined by a Task class with one required input argument ("method": full qualifier name of the method) and one output argument ("return_value")
ppfmethod: same as method but it has one optional input "ppfdict" and one output "ppfdict". The output dictonary is the input dictionary updated by the method. The input dictionary is unpacked before passing to the method. The output dictionary is unpacked when checking conditions in links.
ppfport: ppfmethod which is the identity mapping
script: defined by a Task class with one required input argument ("method": full qualifier name of the method) and one output argument ("return_value")

Hash links

The task graph object in ewokscore provides additional functionality in top of what networkx provides:

A Task can have several positional and named input variables and named output variables.
A Task has a universal hash which is the hash of the inputs with a Task nonce.
An output Variable has a universal hash which is the hash of the Task with the variable name as nonce.
An input Variable can be
- static:
  - provided by the persistent Graph representation
  - universal hash of the data
- dynamic:
  - provided by upstream Tasks at runtime
  - output Variable of the upstream task so it has a universal hash

The actual output data of a Task is never hashed. So we assume that if you provide a task with the same input, you will get the same output. Or at the very least it will not be executed again when succeeded once.

Hash linking of tasks serves the following purpose:

Changing static input upstream in the graph will effectively create new tasks.
The hashes provide a unique ID to create a URI for persistent storage.
Variables can be provided with universal hashes to replace the hashing of the actual inputs.
As data can be passed by passing hashes, serialization for distibuted task scheduling can be done efficiently (not much data to serialize) and no special serializer is required to serialize hashes (as they are just strings).

Data management is currently only a proof-of-concept based on JSON files with the universal hashes as file names.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.7.3

Mar 13, 2024

0.7.2

Mar 13, 2024

0.7.1

Jul 4, 2023

0.7.0

Jul 3, 2023

0.6.1

Jun 20, 2023

0.6.0

Jun 20, 2023

0.5.3

Jun 19, 2023

0.5.2

Jun 12, 2023

0.5.1 yanked

Jun 9, 2023

Reason this release was yanked:

print statement in module

0.5.0 yanked

Jun 9, 2023

Reason this release was yanked:

task discovery regression

0.4.3

Jun 1, 2023

0.4.2

Jun 1, 2023

0.4.1

May 15, 2023

0.4.0

Mar 27, 2023

0.3.3

Mar 24, 2023

0.3.2

Mar 23, 2023

0.3.1

Mar 8, 2023

0.3.0

Nov 4, 2022

0.2.1

Oct 28, 2022

0.2.0

Oct 26, 2022

0.1.1

Sep 2, 2022

0.1.0

Aug 31, 2022

0.1.0rc13 pre-release

Jul 3, 2022

0.1.0rc12 pre-release

Jun 17, 2022

0.1.0rc11 pre-release

Jun 7, 2022

0.1.0rc10 pre-release

May 23, 2022

0.1.0rc9 pre-release

May 20, 2022

0.1.0rc8 pre-release

May 12, 2022

0.1.0rc7 pre-release

May 12, 2022

0.1.0rc6 pre-release

Apr 27, 2022

0.1.0rc5 pre-release

Apr 25, 2022

0.1.0rc4 pre-release

Apr 22, 2022

0.1.0rc3 pre-release

Apr 11, 2022

0.1.0rc2 pre-release

Apr 8, 2022

0.1.0rc1 pre-release

Apr 5, 2022

0.1.0rc0 pre-release

Mar 14, 2022

0.0.12a0 pre-release

Feb 21, 2022

0.0.11a0 pre-release

Feb 6, 2022

0.0.10a0 pre-release

Feb 5, 2022

0.0.9a0 pre-release

Feb 2, 2022

0.0.8a4 pre-release

Jan 10, 2022

0.0.8a3 pre-release

Jan 4, 2022

0.0.8a2 pre-release

Jan 4, 2022

0.0.8a1 pre-release

Dec 20, 2021

0.0.8a0 pre-release

Dec 20, 2021

0.0.7a0 pre-release

Dec 17, 2021

0.0.6a1 pre-release

Dec 15, 2021

0.0.6a0 pre-release

Dec 9, 2021

0.0.5a0 pre-release

Nov 18, 2021

0.0.4a0 pre-release

Nov 9, 2021

0.0.3a0 pre-release

Sep 17, 2021

0.0.2a1 pre-release

Aug 30, 2021

0.0.2a0 pre-release

Aug 23, 2021

0.0.1a3 pre-release

Aug 10, 2021

0.0.1a2 pre-release

Aug 5, 2021

This version

0.0.1a1 pre-release

Jul 29, 2021

0.0.1a0 pre-release

Jul 5, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ewokscore-0.0.1a1.tar.gz (29.6 kB view hashes)

Uploaded Jul 29, 2021 Source

Hashes for ewokscore-0.0.1a1.tar.gz

Hashes for ewokscore-0.0.1a1.tar.gz
Algorithm	Hash digest
SHA256	`9f95c4fb9c51ed60447454ae5521fce3b328e3ed05315c1b825a4e76cb554fb8`
MD5	`6d5199c89f5957adf1f468f1b4c43921`
BLAKE2b-256	`f4ff0dd98b4e93bf2d03131789d25f3f740017b11dc18eee86ab00bbc45ede3d`