Building a computational graph: part 3
16 Feb 2020 · 1735 words

This is the third part of a series on creating a computational graph in Python. You can go back to part one and part two.

There’s a bit of code in this post. Here is the final code in np_wrapping.py. The module for numpy_autograd is here.

In the last post we created computational graphs for a function, but it was a bit hard to use. We also had these problems:

• we didn’t want to replace np.add with add_new, np.exp with exp_new etc everywhere. That’s a pain, especially we have a lot of code to do that for.
• we had to implement primitives manually for every numpy function we want. Is there a way to get them all?
• how do we handle non-differentiable functions?

Here’s the gist of it. We create a fake numpy module called numpy_autograd with wrapped versions of all numpy functions. This fake numpy contains all the functions and objects of the original numpy, except some functions (only the differentiable ones) are added to a computational graph as they are called. Then by writing import numpy_autograd as np, any functions using numpy functions like np.add automatically build a computation graph as they are executed.

### Non-differentiable functions

Autodiff packages like autograd have to watch out for non-differentiable functions. Many functions are not differentiable, like np.asarray, np.shape or np.argmin.

Take np.floor(x) as an example. This is a non-differentiable function: its derivative does not exist for integer values of $x$, and the derivative is 0 everywhere else. So this is not something we’d add to the computation graph if we encountered it.

How should we deal with these functions? There are a few approaches. Some packages like autograd don’t add them to the graph completely. The approach I take here is a bit different: I add them to the computation graph, but I’ll add a flag keepgrad that indicates if the gradient of this function should be calculated or not. So let’s go ahead and modify our primitive function from earlier to include this parameter:

def primitive(f, keepgrad=True):
@wraps(f)
def inner(*args, **kwargs):
## Code to add operation/primitive to computation graph
# We need to separate out the integer/non node case. Sometimes you are adding
# constants to nodes.
def getval(o):      return o.value if type(o) == Node else o
if len(args):       argvals = [getval(o) for o in args]
else:               argvals = args
if len(kwargs):     kwargvals = dict([(k,getval(o)) for k,o in kwargs.items()])
else:               kwargvals =  kwargs

# get parents
l = list(args) + list(kwargs.values())
parents = [o for o in l if type(o) == Node ]

value = f(*argvals, **kwargvals)
print("add", "'" + f.__name__ + "'", "to graph with value",value)
return inner

### Triage

A quick note. It can get confusing working with “original” numpy and “new” numpy, so note that throughout this post if you see something prefixed with _np, that means “original” numpy. Later I use anp to refer to “new” numpy.

Anyway, it’s time to create our version of numpy. All the attributes of numpy are available in _np.__dict__. We are going to split the objects in this dict into three categories:

a) differentiable functions (wrap with primitive, keepgrad=True (the default for primitive)
b) non-differentiable functions (wrap with primitive, keepgrad=False)
c) everything else (leave unchanged)

We create a function wrap_namespace that will copy everything from _np.__dict__ into a new dictionary, wrapping functions based on the three categories above.

import numpy as _np
def wrap_namespace(old, new):
"""Performs triage on objects from numpy, copying them from old to new namespace.
old: __dict__ from original numpy
new: dict to copy old into
"""
# Taken from here:
_np.ndim, _np.shape, _np.iscomplexobj, _np.result_type, _np.zeros_like,
_np.ones_like, _np.floor, _np.ceil, _np.round, _np.rint, _np.around,
_np.fix, _np.trunc, _np.all, _np.any, _np.argmax, _np.argmin,
_np.argpartition, _np.argsort, _np.argwhere, _np.nonzero, _np.flatnonzero,
_np.count_nonzero, _np.searchsorted, _np.sign, _np.ndim, _np.shape,
_np.floor_divide, _np.logical_and, _np.logical_or, _np.logical_not,
_np.logical_xor, _np.isfinite, _np.isinf, _np.isnan, _np.isneginf,
_np.isposinf, _np.allclose, _np.isclose, _np.array_equal, _np.array_equiv,
_np.greater, _np.greater_equal, _np.less, _np.less_equal, _np.equal,
_np.not_equal, _np.iscomplexobj, _np.iscomplex, _np.size, _np.isscalar,
_np.isreal, _np.zeros_like, _np.ones_like, _np.result_type
]
function_types = {_np.ufunc, types.FunctionType, types.BuiltinFunctionType}

for name,obj in old.items():
# non-differentiable functions
elif type(obj) in function_types:  # functions with gradients
# differentiable functions
new[name] = primitive(obj)
else:
# just copy over
new[name] = obj


### Creating our new module

We’re ready to bring all our code together into a module. Here is an overview of the basic procedure.

• make new folder ./numpy_autograd
• put stuff in a file np_wrapping.py with definitions of
• primitive
• notrace_primitive
• Node
• wrap_namespace
• add implementations of the dunder methods to the Node class with setattr.
• make __init__.py and at the top put import * from np_wrapping
• now import using import numpy_autograd as np and you are done

To get started, create a new folder called numpy_autograd, and then create a file called np_wrapper.py. Put the below code in this file.

Start off with imports and primitive:

import numpy as _np
import types
from functools import wraps

@wraps(f)
def inner(*args, **kwargs):
## Code to add operation/primitive to computation graph
# We need to separate out the integer/non node case. Sometimes you are adding
# constants to nodes.
def getval(o):      return o.value if type(o) == Node else o
if len(args):       argvals = [getval(o) for o in args]
else:               argvals = args
if len(kwargs):     kwargvals = dict([(k,getval(o)) for k,o in kwargs.items()])
else:               kwargvals =  kwargs

# get parents
l = list(args) + list(kwargs.values())
parents = [o for o in l if type(o) == Node ]

value = f(*argvals, **kwargvals)
print("add", "'" + f.__name__ + "'", "to graph with value",value)
return inner

Now add the latest iteration of the Node class. It’s similar to before, except it’s modified to incorporate the keepgrad parameter. I’ve also moved the start_node function from last time to be a static method of the class, instead of having it float around by itself.

class Node:
"""A node in a computation graph."""
def __init__(self, value, fun, parents, keepgrad):
self.parents = parents
self.value = value
self.fun = fun

def __repr__(self):
"""A (very) basic string representation"""
if self.value is None: str_val = 'None'
else:                  str_val = str(self.value)
return   "\n" + "Fun: " + str(self.fun) +\
" Value: "+ str_val + \
" Parents: " + str(self.parents)

"""A function to create an empty node to start off the graph"""
fun,parents = lambda x: x, []



Then the wrap_namespace function from earlier:

def wrap_namespace(old, new):
"""Performs triage on objects from numpy, copying them from old to new namespace.
old: __dict__ from original numpy
new: dict to copy old into
"""
# Taken from here:
_np.ndim, _np.shape, _np.iscomplexobj, _np.result_type, _np.zeros_like,
_np.ones_like, _np.floor, _np.ceil, _np.round, _np.rint, _np.around,
_np.fix, _np.trunc, _np.all, _np.any, _np.argmax, _np.argmin,
_np.argpartition, _np.argsort, _np.argwhere, _np.nonzero, _np.flatnonzero,
_np.count_nonzero, _np.searchsorted, _np.sign, _np.ndim, _np.shape,
_np.floor_divide, _np.logical_and, _np.logical_or, _np.logical_not,
_np.logical_xor, _np.isfinite, _np.isinf, _np.isnan, _np.isneginf,
_np.isposinf, _np.allclose, _np.isclose, _np.array_equal, _np.array_equiv,
_np.greater, _np.greater_equal, _np.less, _np.less_equal, _np.equal,
_np.not_equal, _np.iscomplexobj, _np.iscomplex, _np.size, _np.isscalar,
_np.isreal, _np.zeros_like, _np.ones_like, _np.result_type
]
function_types = {_np.ufunc, types.FunctionType, types.BuiltinFunctionType}

for name,obj in old.items():
# non-differentiable functions
elif type(obj) in function_types:  # functions with gradients
# differentiable functions
new[name] = primitive(obj)
else:
# just copy over
new[name] = obj


Now call wrap_namespace(). We’ll hold the wrapped functions in a dict called anp, which we init with the current value of globals(). Calling this anp is to make the syntax for the next step (operator overloading) a bit clearer.

# using globals() here means we can access each np function like np.add:
# it means it is available to the global space.
anp = globals()
wrap_namespace(_np.__dict__, anp)

Finally it’s time for operator overloading. Instead of defining these all in Node, we use setattr to add each method one by one to the Node class. There are many more functions than before and many more dunder methods to match. There are also properties to deal with, like np.ndim, and we use the property keyword of Python to handle these.

## Definitions taken from here:
setattr(Node, 'ndim', property(lambda self: self.value.ndim))
setattr(Node, 'size', property(lambda self: self.value.size))
setattr(Node, 'dtype',property(lambda self: self.value.dtype))
setattr(Node, 'T', property(lambda self: anp['transpose'](self)))
setattr(Node, 'shape', property(lambda self: self.value.shape))

setattr(Node,'__len__', lambda self, other: len(self._value))
setattr(Node,'astype', lambda self,*args,**kwargs: anp['_astype'](self, *args, **kwargs))
setattr(Node,'__neg__', lambda self: anp['negative'](self))
setattr(Node,'__sub__', lambda self, other: anp['subtract'](self, other))
setattr(Node,'__mul__', lambda self, other: anp['multiply'](self, other))
setattr(Node,'__pow__', lambda self, other: anp['power'](self, other))
setattr(Node,'__div__', lambda self, other: anp['divide'](  self, other))
setattr(Node,'__mod__', lambda self, other: anp['mod'](     self, other))
setattr(Node,'__truediv__', lambda self, other: anp['true_divide'](self, other))
setattr(Node,'__matmul__', lambda self, other: anp['matmul'](self, other))
setattr(Node,'__rsub__', lambda self, other: anp['subtract'](other, self))
setattr(Node,'__rmul__', lambda self, other: anp['multiply'](other, self))
setattr(Node,'__rpow__', lambda self, other: anp['power'](   other, self))
setattr(Node,'__rdiv__', lambda self, other: anp['divide'](  other, self))
setattr(Node,'__rmod__', lambda self, other: anp['mod'](     other, self))
setattr(Node,'__rtruediv__', lambda self, other: anp['true_divide'](other, self))
setattr(Node,'__rmatmul__', lambda self, other: anp['matmul'](other, self))
setattr(Node,'__eq__', lambda self, other: anp['equal'](self, other))
setattr(Node,'__ne__', lambda self, other: anp['not_equal'](self, other))
setattr(Node,'__gt__', lambda self, other: anp['greater'](self, other))
setattr(Node,'__ge__', lambda self, other: anp['greater_equal'](self, other))
setattr(Node,'__lt__', lambda self, other: anp['less'](self, other))
setattr(Node,'__le__', lambda self, other: anp['less_equal'](self, other))
setattr(Node,'__abs__', lambda self: anp['abs'](self))
setattr(Node,'__hash__', lambda self: id(self))

Finally, make another file __init__.py in the numpy_autograd folder and add this to it:

from .np_wrapping import *

Now you’re done!

### Using the wrapped numpy

Let’s test it out:

import numpy_autograd as np
def logistic(z):  return 1 / (1 + np.exp(-z))
a1 = logistic(1.5)
add 'exp' to graph with value 0.22313016014842982
add 'true_divide' to graph with value 0.8175744761936437

This works fine. What about something using a non-differentiable function?

def f2(x,y): return np.add(np.floor(np.log(x) * np.exp(y) + x*y), np.exp(x))
a2 = f2(5,1)
add 'log' to graph with value 1.6094379124341003
add 'exp' to graph with value 2.718281828459045
add 'multiply' to graph with value 4.374905831402675
add 'add' to graph with value 157.4131591025766