Building a computational graph: part 3
16 Feb 2020 · 1673 words

This is the third part of a series on creating a computational graph in Python. You can go back to part one and part two.

There’s a bit of code in this post. Here is the final code in `np_wrapping.py`. The module for `numpy_autograd` is here.

In the last post we created computational graphs for a function, but it was a bit hard to use. We also had these problems:

• we didn’t want to replace `np.add` with `add_new`, `np.exp` with `exp_new` etc everywhere. That’s a pain, especially we have a lot of code to do that for.
• we had to implement primitives manually for every `numpy` function we want. Is there a way to get them all?
• how do we handle non-differentiable functions?

Here’s the gist of it. We create a fake `numpy` module called `numpy_autograd` with wrapped versions of all `numpy` functions. This fake `numpy` contains all the functions and objects of the original `numpy`, except some functions (only the differentiable ones) are added to a computational graph as they are called. Then by writing `import numpy_autograd as np`, any functions using numpy functions like `np.add` automatically build a computation graph as they are executed.

### Non-differentiable functions

Autodiff packages like `autograd` have to watch out for non-differentiable functions. Many functions are not differentiable, like `np.asarray`, `np.shape` or `np.argmin`.

Take `np.floor(x)` as an example. This is a non-differentiable function: its derivative does not exist for integer values of \$x\$, and the derivative is 0 everywhere else. So this is not something we’d add to the computation graph if we encountered it.

How should we deal with these functions? There are a few approaches. Some packages like `autograd` don’t add them to the graph completely. The approach I take here is a bit different: I add them to the computation graph, but I’ll add a flag `keepgrad` that indicates if the gradient of this function should be calculated or not. So let’s go ahead and modify our `primitive` function from earlier to include this parameter:

``````def primitive(f, keepgrad=True):
@wraps(f)
def inner(*args, **kwargs):
## Code to add operation/primitive to computation graph
# We need to separate out the integer/non node case. Sometimes you are adding
# constants to nodes.
def getval(o):      return o.value if type(o) == Node else o
if len(args):       argvals = [getval(o) for o in args]
else:               argvals = args
if len(kwargs):     kwargvals = dict([(k,getval(o)) for k,o in kwargs.items()])
else:               kwargvals =  kwargs

# get parents
l = list(args) + list(kwargs.values())
parents = [o for o in l if type(o) == Node ]

value = f(*argvals, **kwargvals)
print("add", "'" + f.__name__ + "'", "to graph with value",value)
return inner
``````

### Triage

A quick note. It can get confusing working with “original” `numpy` and “new” `numpy`, so note that throughout this post if you see something prefixed with `_np`, that means “original” `numpy`. Later I use `anp` to refer to “new” `numpy`.

Anyway, it’s time to create our version of `numpy`. All the attributes of `numpy` are available in `_np.__dict__`. We are going to split the objects in this dict into three categories:

a) differentiable functions (wrap with `primitive, keepgrad=True` (the default for `primitive`)
b) non-differentiable functions (wrap with `primitive, keepgrad=False`)
c) everything else (leave unchanged)

We create a function `wrap_namespace` that will copy everything from `_np.__dict__` into a new dictionary, wrapping functions based on the three categories above.

``````import numpy as _np
def wrap_namespace(old, new):
"""Performs triage on objects from numpy, copying them from old to new namespace.
old: __dict__ from original numpy
new: dict to copy old into
"""
# Taken from here:
_np.ndim, _np.shape, _np.iscomplexobj, _np.result_type, _np.zeros_like,
_np.ones_like, _np.floor, _np.ceil, _np.round, _np.rint, _np.around,
_np.fix, _np.trunc, _np.all, _np.any, _np.argmax, _np.argmin,
_np.argpartition, _np.argsort, _np.argwhere, _np.nonzero, _np.flatnonzero,
_np.count_nonzero, _np.searchsorted, _np.sign, _np.ndim, _np.shape,
_np.floor_divide, _np.logical_and, _np.logical_or, _np.logical_not,
_np.logical_xor, _np.isfinite, _np.isinf, _np.isnan, _np.isneginf,
_np.isposinf, _np.allclose, _np.isclose, _np.array_equal, _np.array_equiv,
_np.greater, _np.greater_equal, _np.less, _np.less_equal, _np.equal,
_np.not_equal, _np.iscomplexobj, _np.iscomplex, _np.size, _np.isscalar,
_np.isreal, _np.zeros_like, _np.ones_like, _np.result_type
]
function_types = {_np.ufunc, types.FunctionType, types.BuiltinFunctionType}

for name,obj in old.items():
# non-differentiable functions
elif type(obj) in function_types:  # functions with gradients
# differentiable functions
new[name] = primitive(obj)
else:
# just copy over
new[name] = obj

``````

### Creating our new module

We’re ready to bring all our code together into a module. Here is an overview of the basic procedure.

• make new folder `./numpy_autograd`
• put stuff in a file `np_wrapping.py` with definitions of
• `primitive`
• `notrace_primitive`
• `Node`
• `wrap_namespace`
• add implementations of the dunder methods to the `Node` class with `setattr`.
• make `__init__.py` and at the top put `import * from np_wrapping`
• now import using `import numpy_autograd as np` and you are done

To get started, create a new folder called `numpy_autograd`, and then create a file called `np_wrapper.py`. Put the below code in this file.

Start off with imports and `primitive`:

``````import numpy as _np
import types
from functools import wraps

@wraps(f)
def inner(*args, **kwargs):
## Code to add operation/primitive to computation graph
# We need to separate out the integer/non node case. Sometimes you are adding
# constants to nodes.
def getval(o):      return o.value if type(o) == Node else o
if len(args):       argvals = [getval(o) for o in args]
else:               argvals = args
if len(kwargs):     kwargvals = dict([(k,getval(o)) for k,o in kwargs.items()])
else:               kwargvals =  kwargs

# get parents
l = list(args) + list(kwargs.values())
parents = [o for o in l if type(o) == Node ]

value = f(*argvals, **kwargvals)
print("add", "'" + f.__name__ + "'", "to graph with value",value)
return inner
``````

Now add the latest iteration of the `Node` class. It’s similar to before, except it’s modified to incorporate the `keepgrad` parameter. I’ve also moved the `start_node` function from last time to be a static method of the class, instead of having it float around by itself.

``````class Node:
"""A node in a computation graph."""
def __init__(self, value, fun, parents, keepgrad):
self.parents = parents
self.value = value
self.fun = fun

def __repr__(self):
"""A (very) basic string representation"""
if self.value is None: str_val = 'None'
else:                  str_val = str(self.value)
return   "\n" + "Fun: " + str(self.fun) +\
" Value: "+ str_val + \
" Parents: " + str(self.parents)

"""A function to create an empty node to start off the graph"""
fun,parents = lambda x: x, []

``````

Then the `wrap_namespace` function from earlier:

``````def wrap_namespace(old, new):
"""Performs triage on objects from numpy, copying them from old to new namespace.
old: __dict__ from original numpy
new: dict to copy old into
"""
# Taken from here:
_np.ndim, _np.shape, _np.iscomplexobj, _np.result_type, _np.zeros_like,
_np.ones_like, _np.floor, _np.ceil, _np.round, _np.rint, _np.around,
_np.fix, _np.trunc, _np.all, _np.any, _np.argmax, _np.argmin,
_np.argpartition, _np.argsort, _np.argwhere, _np.nonzero, _np.flatnonzero,
_np.count_nonzero, _np.searchsorted, _np.sign, _np.ndim, _np.shape,
_np.floor_divide, _np.logical_and, _np.logical_or, _np.logical_not,
_np.logical_xor, _np.isfinite, _np.isinf, _np.isnan, _np.isneginf,
_np.isposinf, _np.allclose, _np.isclose, _np.array_equal, _np.array_equiv,
_np.greater, _np.greater_equal, _np.less, _np.less_equal, _np.equal,
_np.not_equal, _np.iscomplexobj, _np.iscomplex, _np.size, _np.isscalar,
_np.isreal, _np.zeros_like, _np.ones_like, _np.result_type
]
function_types = {_np.ufunc, types.FunctionType, types.BuiltinFunctionType}

for name,obj in old.items():
# non-differentiable functions
elif type(obj) in function_types:  # functions with gradients
# differentiable functions
new[name] = primitive(obj)
else:
# just copy over
new[name] = obj

``````

Now call `wrap_namespace()`. We’ll hold the wrapped functions in a dict called `anp`, which we init with the current value of `globals()`. Calling this `anp` is to make the syntax for the next step (operator overloading) a bit clearer.

``````# using globals() here means we can access each np function like np.add:
# it means it is available to the global space.
anp = globals()
wrap_namespace(_np.__dict__, anp)
``````

Finally it’s time for operator overloading. Instead of defining these all in `Node`, we use `setattr` to add each method one by one to the `Node` class. There are many more functions than before and many more dunder methods to match. There are also properties to deal with, like `np.ndim`, and we use the `property` keyword of Python to handle these.

``````## Definitions taken from here:
setattr(Node, 'ndim', property(lambda self: self.value.ndim))
setattr(Node, 'size', property(lambda self: self.value.size))
setattr(Node, 'dtype',property(lambda self: self.value.dtype))
setattr(Node, 'T', property(lambda self: anp['transpose'](self)))
setattr(Node, 'shape', property(lambda self: self.value.shape))

setattr(Node,'__len__', lambda self, other: len(self._value))
setattr(Node,'astype', lambda self,*args,**kwargs: anp['_astype'](self, *args, **kwargs))
setattr(Node,'__neg__', lambda self: anp['negative'](self))
setattr(Node,'__sub__', lambda self, other: anp['subtract'](self, other))
setattr(Node,'__mul__', lambda self, other: anp['multiply'](self, other))
setattr(Node,'__pow__', lambda self, other: anp['power'](self, other))
setattr(Node,'__div__', lambda self, other: anp['divide'](  self, other))
setattr(Node,'__mod__', lambda self, other: anp['mod'](     self, other))
setattr(Node,'__truediv__', lambda self, other: anp['true_divide'](self, other))
setattr(Node,'__matmul__', lambda self, other: anp['matmul'](self, other))
setattr(Node,'__rsub__', lambda self, other: anp['subtract'](other, self))
setattr(Node,'__rmul__', lambda self, other: anp['multiply'](other, self))
setattr(Node,'__rpow__', lambda self, other: anp['power'](   other, self))
setattr(Node,'__rdiv__', lambda self, other: anp['divide'](  other, self))
setattr(Node,'__rmod__', lambda self, other: anp['mod'](     other, self))
setattr(Node,'__rtruediv__', lambda self, other: anp['true_divide'](other, self))
setattr(Node,'__rmatmul__', lambda self, other: anp['matmul'](other, self))
setattr(Node,'__eq__', lambda self, other: anp['equal'](self, other))
setattr(Node,'__ne__', lambda self, other: anp['not_equal'](self, other))
setattr(Node,'__gt__', lambda self, other: anp['greater'](self, other))
setattr(Node,'__ge__', lambda self, other: anp['greater_equal'](self, other))
setattr(Node,'__lt__', lambda self, other: anp['less'](self, other))
setattr(Node,'__le__', lambda self, other: anp['less_equal'](self, other))
setattr(Node,'__abs__', lambda self: anp['abs'](self))
setattr(Node,'__hash__', lambda self: id(self))
``````

Finally, make another file `__init__.py` in the `numpy_autograd` folder and add this to it:

``````from .np_wrapping import *
``````

Now you’re done!

### Using the wrapped numpy

Let’s test it out:

``````import numpy_autograd as np
def logistic(z):  return 1 / (1 + np.exp(-z))
a1 = logistic(1.5)
``````
``````add 'exp' to graph with value 0.22313016014842982
add 'true_divide' to graph with value 0.8175744761936437
``````

This works fine. What about something using a non-differentiable function?

``````def f2(x,y): return np.add(np.floor(np.log(x) * np.exp(y) + x*y), np.exp(x))
a2 = f2(5,1)
``````
``````add 'log' to graph with value 1.6094379124341003
add 'exp' to graph with value 2.718281828459045
add 'multiply' to graph with value 4.374905831402675