Skip to main content

Automatic differentiation package AC207

Project description

codecov

Build Status

cs107-FinalProject

Group 26

  • David Chataway
  • Javiera Astudillo
  • Wenhan Zhang

Introduction

Our library, AutoDiff, provides an automatic differentiation (AD) tool that is designed to be very user-friendly. The base autodiff class contains modules for both forward and reverse mode. A user provides an array of values and multivariate lambda functions, specifies the mode and a seed vector (optional) and is able to compute the jacobian, the gradients and the directional derivative with respect to a custom direction so that other users will be able to use the library in other algorithms like Newton’s method. Developing a user-friendly and efficient AD library is important for two primary reasons:

  1. Automatic differentiation can improve upon symbolic differentiation offered by symbolic libraries (like SymPy) or by web services (like Wolfram Alpha) because AD methods can achieve machine precision while executing faster because AD does not need to carry through the whole expression tree and doesn’t suffer from exponentially longer derivative expressions (due to for example multiple product rules).

  2. Efficient automatic differentiation is necessary to optimize computational cost and time due to different scaling between forward and reverse modes and the storing of intermediate variables and expressions. This is crucial in many applications such as optimization algorithms (e.g. gradient descent).

Background

In our understanding, at the heart of AD is the idea that functional expressions can be broken down into elementary symbolic expressions and the numerical result of the function’s derivatives can be calculated by storing the value and derivative value at intermediate variables of those expressions. The graph structure of automatic differentiation arises because these intermediate variables can have both parents and children that each point to different steps in the evaluation trace. Furthermore, many of these intermediate variables will have multiple children - i.e. they will be used more than once in the evaluation trace. Although forward and reverse mode differ in terms of the order in which the graph is evaluated, both modes 1) produce the same graph, 2) evaluate the derivatives at each child node by using a stored derivative value with respect to the parent node(s) and 3) take a function of dimension n (number of input variables) and produce its numerical derivative, in the form of a Jacobian (or its transpose) of dimension m (number of outputs).

Forward mode

In the forward mode, we calculate the partial derivatives by evaluating the chain rule for each intermediate variable during the forward pass. This is done by both applying a symbolic derivative of the intermediate variable and directly applying the derivative value of the parent node(s). For instance, for an intermediate variable expressed as a function of a function (its parent node), h(u(t)), the chain rule is evaluated as follows:

where the first term of the multiplication is evaluated symbolically and the second term is taken from the derivative value of the parent node.

Because of the graph structure, the forward mode will be able to compute the partial derivative of all of the n outputs with respect to a single input variable (if seeded in the unit direction of that specific input variable). That is, each forward pass produces a single column of the Jacobian matrix.

Reverse mode

In the reverse mode, rather than directly computing the chain rule at each intermediate variable, you just store the partial derivatives and the dependencies of the expression tree during the forward pass. Then, you do a reverse pass by starting at a single output variable, ‘seeding it’ (e.g. set the derivative of the output variable to 1 and the derivatives of the other output variables to 0) and propagating backwards to all the input variables. We are essentially asking, ‘the output variable affect can affect which input variables’? This is possible by calculating the ‘adjoint’ which stores the current value of the partial derivative of the output with respect to the variable at node i according to the following formula, where j is a child of i.

However, in order to calculate the derivative of a different output variable, we need to re-run the program again with a different unit seed. As a result, each run of the reverse mode will generate a single row of the Jacobian (or a column of the transpose of the Jacobian).

Example

For example, consider the function f(x,y), shown below (same as the one in pair programming 6).

Rather than storing the entire symbolic derivative expression, we can instead break down the function into intermediate variables (shown below) and store (or overload) the symbolic derivatives of the elemental operations of +/-, **2, exp(), sin() and cos().

For forward mode, all that needs to be stored is a tuple of the value and the derivative value according to a particular seed. Thus, this evaluation would need to be run twice for the two seed vectors corresponding to each input variable.

If this automatic differentiation were to be evaluated in reverse mode, on the forward pass at each node we would instead store the values of the intermediate variable partial derivatives with respect to each parent argument (rather that applying the chain rule) and then we would work backwards in one pass to calculate df/dx and df/dy using the adjoint method. These graph structures are outlined below:

How to use

Our solution is provided as a Python package called AutoDiff and is available through pip, a package installer tool for Python. The prerequisite to use AutoDiff is to have Python installed as well as numpy and scipy packages installed. If you install AutoDiff through pip then the dependencies will be automatically installed (numpy and scipy).

PIP

python3 -m pip install autodiff-djw

Create AutoDiff

The main access point to our library is through the AutoDiff class. To instantiate an object of this class the user must provide the following arguments:

AutoDiff(f, vals, mode='forward')

  • f: array_like

      List-like output functions. Order preserving.

  • vals: array_like

      List-like initital values for inputs. All output functions will be evaluated at this values.

  • mode: string (default=forward)

      Either forward or backward mode.

Example: new AutoDiff

from autodiff import AutoDiff

inputs_vals = [np.pi/2, np.pi/4, np.pi/8] 
f1 = ...
f2 = ...
f = [f1, f2]
autodiff = AutoDiff(vals=inputs_vals, f=f, mode='forward')

Function creation

To create functions ([f1,f2] in the previous example) that are compatible with our library methods (i.e implements automatic differentiation of its outputs) you must provide regular python functions which: (1) meet the signature as described following and (2) use functions supported by our library.

Function signature

The elements of f (provided at AutoDiff instantiation) must be regular python functions which meet the following signature:

f1(inputs)

Arguments
---------
inputs: array_like
    A sequence representing the inputs. Each element represents one input variable (scalar).
Returns
-------
output: numeric (scalar)
    A function of the inputs.

Supported operations

To enable automatic differentiation features, any user defined function must use the library provided functions:

  • Basic Operations:

    • Addition (commutative) (+)
    • Subtraction (-)
    • Multiplication (commutative) (*)
    • Division (/)
    • Power (**)
    • Negation (-)
  • Elementary Functions:

    • log (base configurable)
    • exp
    • sin
    • cos
    • tan
    • arcsin
    • arccos
    • arctan
    • sinh
    • cosh
    • tanh
    • sqrt
    • expit

These functions may be accesed through the autodiff.elementary module and the user should import them before declaring a function. All of the elementary functions meet the same signature as numpy equivalent functions.

Example: declaring functions

from autodiff.elementary import sin

f1 = lambda inputs: inputs[0]*inputs[2]**2 
f2 = lambda inputs: inputs[2]*sin(inputs[1])
f = [f1, f2]

or equivalently:

def f1(inputs): 
    x1 = inputs[0] # first input
    x3 = inputs[2] # second input
    return x1*(x3**2)

def f2(inputs): 
    x2 = inputs[1] # first input
    x3 = inputs[2] # second input
    return x2*sin(x3)

f = [f1, f2]

AutoDiff attributes and methods

Attributes

  • inputs: get the user provided inputs values.
  • outputs: get the output of the user provided functions, evaluated at inputs.
  • jacobian: get the jacobian.

Methods

  • dir_der
dir_der(seed)

    Calculates the derivative of each output function in the direction of `dir`=`seed`.

    Arguments
    ---------
    seed: array-like (M,)
        Represents the direction to calculate the derivative of each output function.
        Must have the same dimension as the number of inputs.
        M: number of inputs

    Returns
    -------
    list (N,)
        Derivative of each input function in the direction of `seed` vector.
        N: number of functions    
  • gradient
gradient(fn_idx)
    Creates or pulls a row of the Jacobian.
    Calculates the derivative of the output function `fn_index` with respect to each input variable.

    Arguments
    --------
    fn_idx: int 
        Index of the output function to calculate the gradient.

    Returns
    -------
    list (M,)
        M: number of inputs.

Demo

Let's take the following vector function to illustrate the usage of our library:

Now we'll evaluate this function at the point x = [pi/2, pi/4, pi/8] and evaluate the directional derivative of each output function with respect to v = [1,2,1]:

This analytical solution may be approximated through the AutoDiff library through either the forward or reverse modes as follows. It can be seen that the returned derivatives are equal to each other (within machine precision) and to the expected value.

Forward Mode

import numpy as np
from autodiff import AutoDiff
from autodiff.elementary import log, exp, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, sqrt, expit

inputs_vals = [np.pi/2, np.pi/4, np.pi/8] 
seed = [1,2,1]

f1 = lambda inputs: inputs[0]*inputs[2]**2 
f2 = lambda inputs: inputs[2]*sin(inputs[1])
f = np.array([f1, f2])

autodiff = AutoDiff(f=f, vals=inputs_vals, mode='forward') 

print(autodiff.inputs)
>> [1.5707963267948966, 0.7853981633974483, 0.39269908169872414]

print(autodiff.outputs)
>> [0.24223653656484231, 0.2776801836348979]

print(autodiff.jacobian)
>> [[0.15421257 0.         1.23370055]
 [0.         0.27768018 0.70710678]]

print(autodiff.gradient(fn_idx=0))
>> [0.15421257 0.         1.23370055]

print(autodiff.dir_der(seed=seed))
[1.38791312 1.26246715]

Reverse Mode

import numpy as np
from autodiff import AutoDiff
from autodiff.elementary import log, exp, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, sqrt, expit

inputs_vals = [np.pi/2, np.pi/4, np.pi/8] 
seed = [1,2,1]

f1 = lambda inputs: inputs[0]*inputs[2]**2 
f2 = lambda inputs: inputs[2]*sin(inputs[1])
f = np.array([f1, f2])

autodiff = AutoDiff(f=f, vals=inputs_vals, mode='reverse') 

print(autodiff.inputs)
>> [1.5707963267948966, 0.7853981633974483, 0.39269908169872414]

print(autodiff.outputs)
>> [0.24223653656484231, 0.2776801836348979]

print(autodiff.jacobian)
>> [[0.15421257 0.         1.23370055]
 [0.         0.27768018 0.70710678]]

print(autodiff.gradient(fn_idx=0))
>> [0.15421257 0.         1.23370055]

print(autodiff.dir_der(seed=seed))
[1.38791312 1.26246715]

Software organization

The current structure of our project is as following:

.
├── src
│   ├── autodiff
│   │   ├── autodiff_.py
│   │   ├── elementary.py
│   │   ├── forward.py
│   │   ├── reverse.py
│   │   └── ...
│   ├── demo
│   └── tests
└── docs
    ├── README.md
    └── imgs

forward.py and reverse.py modules contain the core functions to use our AutoDiff package in forward and reverse mode correspondingly. elementary.py provides elementary math functions to use in conjunction with our library while autodiff_.py contains the AutoDiff class for accesing the auto-differentiation methods of our library. All our tests are located in src/tests which are integrated with Travis CI and Codecov. To run the test suite you'll need to install pytest and codecov:

pip install pytest pytest-cov
pip install codecov

Once both previous packages are installed you may run the test suite locally (starting at the root of our folder structure):

cd .
export PYTHONPATH="$PWD/src"
python -m pytest --cov src/tests

Implementation details

There's a common interface to access the automatic differentiation tools provided by this library: AutoDiff class. Under the hood, AutoDiff implements a factory pattern at instantiation that returns either an AutoDiffForward or AutoDiffReverse object. This pattern was selected so that the the usage of the AutoDiff object was transparent to the user with regard to the implementation of the auto-differentiation mode used.

Forward Mode

  1. A "Variable" class was created with .val and .der attributes and overloaded elementary dunder methods and elementary functions
  2. AutoDiffForward class automatically creates Variable objects according to the input array.
  3. AutoDiffForward class creates the Jacobian (.jacobian) by calling the directional derivative (dir_dev) method on unit seed vectors and iterating through the size M (number of input variables)
  4. AutoDiffForward class gradient method requires the creation of the Jacobian and returns a row.

Let's look at a sample function sin within Variable class to understand better how it implements the auto-differentiation underneath:

class Variable:
    def __init__(self,**kwargs):
        ...
        self.val = kwargs['val']
        self.der = kwargs['der']
        ...
def sin(a):
    out = Variable()
    try:
        out.val = np.sin(a.val)
        out.der = np.cos(a.val)*a.der
    except AttributeError:
        try:
            out.val = np.sin(a)
        except Exception as e:
            raise TypeError('Sin failed: `input` not a `Variable` object or a real number.')
    return out

For the forward calculation you need to first seed the value of the derivative at the very beginning self.der = kwargs['der']. Once you've done that, any call to a mathematical function passing a Variable object as argument willl inmediately return a Variable object with the updated val and derivative with no need to keep track of the forward pass. For the case of sin we know that the output value is np.sin(a.val) while it's partial derivative will be cos(a.val)*a.der following the chain rule.

Reverse Mode

  1. A "Node" class was created with .val, .grad_value and .children ([partial der, child object]) attributes and overloaded elementary dunder methods and elementary functions that update .children
    • A grad() method was created that recursively sums the (partial der * adjoint) for all children of the parent Node (thus acting as the backwards pass)
  2. AutoDiffReverse class automatically creates Node objects according to the input array.
  3. AutoDiffReverse class creates the Jacobian by calling the gradient method on unit seed vectors and iterating through the size N (number of output functions).
  4. AutoDiffReverse class directional derivative (dir_dev) method requires the creation of the Jacobian and returns a matrix multiplication of the Jacobian times the custom vector.

Let's look at the same sample function sin but within Node class this time:

class Node:
    def __init__(self, **kwargs):
        self.val = 0
        self.children = [] 
        self.grad_value = None #the adjoint
        ...
    def grad(self):

        # recurse only if the value is not yet cached
        if self.grad_value is None:
            # recursively calculate derivative (adjoint) using chain rule
            self.grad_value = sum(weight * node.grad()
                                  for weight, node in self.children)
        return self.grad_value

    def gradient(self, fn_idx):
            ... 
            inputs_vars = [Node(val=val) for val in self.vals] # create Node inputs
            f_vars = np.array([f_(inputs_vars) for f_ in self.f]) 

            # Seed with .grad_value = 1 only for the function at fn_idx and all all others .grad_value = 0
            for i in range(len(f_vars)):
                if i == fn_idx:
                    f_vars[i].grad_value = 1.0
                else:
                    f_vars[i].grad_value = 0

            # Get the partial derivative wrt each input
            der = [inputs_.grad() for inputs_ in inputs_vars]
            return der

    def sin(a):
        out = Node()
        try:
            out.val = np.sin(a.val)
            a.children.append((np.cos(a.val), out)) # weight = ∂z/∂self = cos(self)
        except AttributeError:
            try:
                out.val = np.sin(a)
            except Exception as e:
                raise TypeError('Sin failed: `input` not an `Node` object or a real number.')
        return out

Note that for calculating the gradient through the reverse mode we need to first append the partial derivatives and the dependencies of the expression tree during the forward pass. This is done when calling f_vars = np.array([f_(inputs_vars) for f_ in self.f]) within gradient which will pass through our custom functions such as sin.

Then, you do a reverse pass by starting at a single output variable, ‘seeding it’ (e.g. set the derivative of the output variable to 1 and the derivatives of the other output variables to 0). This is executed at lines f_vars[i].grad_value = .... After seeding, you propagate backwards to all the input variables (this is done when calling for der = [inputs_.grad() for inputs_ in inputs_vars]).

Our extension

Our proposed extension includes a working reverse mode implementation. At user level, it exposes the same methods and attributes as the forward mode and the usage is transparent for the user (doesn't realize any change in the usage other than setting the mode to backward). More specifically, the user may instantiate an AutoDiff object through the following syntax (similar to the forward syntax):

autodiff = AutoDiff(vals=inputs_vals, f=f, mode='reverse')

The user may access the following methods and attributes from this class:

  • inputs
  • outputs
  • jacobian
  • dir_der(seed)
  • gradient(fn_idx)

Which represent exactly the same values as for the forward mode described in previous sections. It's usage is also exactly the same as in the forward mode.

All background content necessary to understand the reverse mode may be found in Background > Reverse mode section while it's usage it's described throughout all previous decoumentation alongside forward usage.

Broader Impact and Inclusivity Statement

We created this library to be as user-friendly as possible and to assist with common differentiation tasks like Newton's method and optimization problems. However, it is possible that as the library grows over time - in usage and functionality - that people may misuse our library. While we anticipate that initially the users will be primarily academics and hobbyists, what happens if the library becomes adopted by those with broad and significant societal impact? For instance, if the COMPAS tool - used to assess the likelihood of defendants becoming a recidivist - adopted our library to optimize their predictive models, it would become crucial to maintain the integrity of the library. As a result, we would need to devote more resources to testing, error management and debugging. Otherwise, the societal impacts of a erroneous automatic differentiation library could be vast.

In making our AD as user-friendly as possible, we considered the perspectives of a varied audience and sought not to deliberately exclude individuals or groups based on different cultural, economical, social, gender and disability groups. For instance, we tried to refrain from using exclusive terms such as master. However, we are cognisant that our library is exclusive because of three core deficiencies: 1) our documentation and much of our code's functions are writen only in English, 2) our code and documentation makes reference to parent and child (nodes) which may be triggering to some, and 3) pull requests to the library can only be approved by select few, which may make others feel like they cannot contribute. Given more time, we would like to make our library as language-friendly as possible, by writing documentation in different languages and code that is less English-centric, 2) determine better terminology for the parent-child relationship, and 3) democratize the contribution reviewal and approval process. Finally, it is important to consider that there may be unanticipated exlusive characteristics of our library that will be revealed with broader usage. Thus, it is paramount that a forum be created for users to voice concerns and for there to be a standardized process to review those concerns and implement changes if needed.

Future Features

Future work on the library will remain focused on the original objective: to make an AD package that is as user-friendly as possible for our target users, primarily academics and hobbyists. As such, development work on future features is as follows:

  • Add more compatible functions primarily usage in academic research (e.g. absolute value, Gaussian)
    • Complement the test suite with tests for the additional functions
  • Print run time and/or computational intensity of the class methods
    • Initial ideas include implementing the time module and using the time.clock() method
  • Add logic to decide on using Forward or Reverse mode based on m, n dimensionality
    • Print the suggestion to the user to help with useability
  • Create a graphical user interface (GUI) to execute the library
    • This could be a done by hosting a public server similar to autoed.herokuapp.com
  • Create a cache to improve runtime (e.g. data structure to track stored intermediate variables on independent passes)
    • Initial ideas for the data structures include a disctionary or graph

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autodiff-djw-0.1.3.tar.gz (15.7 kB view hashes)

Uploaded Source

Built Distribution

autodiff_djw-0.1.3-py3-none-any.whl (18.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page