Anthony Corletti
Published on

Partial Derivatives & Recursive Descent

Authors

Writing partial derivatives is a great way to understand some of the underlying features of machine learning and neural network libraries.

In this post I'll explain how partial derivates are a necessary building block in understanding machine learning and neural networks, and how to write some python code to help bring partial derivates and recursive descent to life!

Simply put, a partial derivative is a derivative where we hold some of the variables constant.

For example, take the following function:

f(x)=x2f(x) = x^2

It's derivative with respect to xx is:

df(x)dx=2x\frac{df(x)}{dx} = 2x

But what about two variables? xx and yy?

f(x,y)=x2+y3f(x, y) = x^2 + y^3

To find the partial derivative of the above function with respect to one variable, let's say xx, we treat yy as a constant, so we get the following:

(f(x,y))(x)=2x+0=2x\frac{\partial(f(x,y))}{\partial(x)} = 2x + 0 = 2x

And the partial derivative with respect to yy:

(f(x,y))(y)=0+3y2=3y2\frac{\partial(f(x,y))}{\partial(y)} = 0 + 3y^2 = 3y^2

That's it. That's all there is to it.

Just remember to treat all other variables as if they are constants.

Now what's recursive descent? Well, imagine these functions being plotted in a graph in some nn dimensions where we feed inputs to our function and map those inputs to whatever the function returns.

Recursive descent means that we will keep feeding inputs that keep returning smaller and smaller outputs in order to find the lowest valley or valleys in the functions space across our partial derivatives.

In this way, a neural network can be the function with many parameters that can be adjusted over time in order to return desired or correct outcomes.

So that's enough about math, let's actually write some code that defines expressions and evaluates derivatives.

I'm using python 3.8 for this if you'd like to follow along.

To start, we'll write out all classes we'll be working with.

Namely, we'll need a base class, what I'm calling Expression (Expr), a class for variables (Var), constants (Const), and for operations – let's just focus on adding and multiplying for now (Add, Mult).

class Expr:
pass
class Add(Expr):
pass
class Mult(Expr):
pass
class Var(Expr):
pass
class Const(Expr):
pass

We'll be building our expressions with a binary tree structure, such that each add and multiply will have to evaluate a left and right sub-structue of expressions, variables, and/ or constants.

Now let's fill in our __init__ methods.

from numbers import Number
class Expr:
pass
class Add(Expr):
def __init__(self, left: Expr, right: Expr):
self.left = left
self.right = right
class Mult(Expr):
def __init__(self, left: Expr, right: Expr):
self.left = left
self.right = right
class Var(Expr):
def __init__(self, name: str):
self.name = name
class Const(Expr):
def __init__(self, val: Number):
self.val = val

Hm, already noticing some re-factoring we can do. Let's combine Add and Mult into one class, a binary operator class (BinOp).

class BinOp(Expr):
def __init__(self, left: Expr, right: Expr, operator: str):
self.left = left
self.right = right
self.operator = operator

Now we want to be able to view and evaluate our expressions so lets add our __str__ and eval methods and run a few examples.

from numbers import Number
from typing import Dict
class Expr:
pass
class BinOp(Expr):
def __init__(self, left: Expr, right: Expr, operator: str):
self.left = left
self.right = right
self.operator = operator
def __str__(self) -> str:
return f"({self.left}{self.operator}{self.right})"
def eval(self, input: Dict[str, Number]) -> Number:
left_eval = self.left.eval(input)
right_eval = self.right.eval(input)
return eval(f"{left_eval}{self.operator}{right_eval}")
class Const(Expr):
def __init__(self, val: Number):
self.val = val
def __str__(self) -> str:
return str(self.val)
def eval(self, input: Dict[str, Number]) -> Number:
return self.val
class Var(Expr):
def __init__(self, name: str):
self.name = name
def __str__(self) -> str:
return self.name
def eval(self, input: Dict[str, Number]) -> Number:
return input[self.name]

And with running a few tests we should see this work ...

Note that I'm defining our input as a dict so that we could have n unique parameters passed in as { "x": 1, "y": 1 ... }

# 3 * (y + x)
e1 = BinOp(Const(3), BinOp(Var("y"), Var("x"), "+"), "*")
print(f"e1: {e1}")
# (3 * y) + x
e2 = BinOp(BinOp(Const(3), Var("y"), "*"), Var("x"), "+")
print(f"e2: {e2}")
input = {"x": 2, "y": 4}
print(f"e1.eval(input): {e1.eval(input)}")
print(f"e2.eval(input): {e2.eval(input)}")
# e1: (3*(y+x))
# e2: ((3*y)+x)
# e1.eval(input): 18
# e2.eval(input): 14

Nice! So we can print and evaluate expressions. Now let's experiment with taking partial derivatives.

We'll need to define deriv functions for each Expr like so.

The final state of code should look like the following.

from numbers import Number
from typing import Dict
class Expr:
pass
class BinOp(Expr):
def __init__(self, left: Expr, right: Expr, operator: str):
self.left = left
self.right = right
self.operator = operator
def __str__(self) -> str:
return f"({self.left}{self.operator}{self.right})"
def eval(self, input: Dict[str, Number]) -> Number:
left_eval = self.left.eval(input)
right_eval = self.right.eval(input)
return eval(f"{left_eval}{self.operator}{right_eval}")
def deriv(self, var: str) -> Expr:
if self.operator == "+":
return BinOp(self.left.deriv(var), self.right.deriv(var), "+")
elif self.operator == "*":
return BinOp(BinOp(self.left.deriv(var), self.right, "*"),
BinOp(self.left, self.right.deriv(var), "*"), "+")
class Const(Expr):
def __init__(self, val: Number):
self.val = val
def __str__(self) -> str:
return str(self.val)
def eval(self, input: Dict[str, Number]) -> Number:
return self.val
def deriv(self, var: str) -> Expr:
return Const(0)
class Var(Expr):
def __init__(self, name: str):
self.name = name
def __str__(self) -> str:
return self.name
def eval(self, input: Dict[str, Number]) -> Number:
return input[self.name]
def deriv(self, var: str) -> Expr:
if self.name == var:
return Const(1)
else:
return Const(0)

And for our examples ...

# x2 + xy + y3
a = BinOp(Var("x"), Var("x"), "*")
b = BinOp(Var("x"), Var("y"), "*")
c = BinOp(Var("y"), BinOp(Var("y"), Var("y"), "*"), "*")
e5 = BinOp(a, BinOp(b, c, "+"), "+")
e5x = e5.deriv("x")
e5y = e5.deriv("y")
print(f"e5: {e5}")
print(f"e5x: {e5x}")
print(f"e5y: {e5y}")
input = {"x": 1, "y": 1}
print(f"e5.eval(input): {e5.eval(input)}")
print(f"e5x.eval(input): {e5x.eval(input)}")
print(f"e5y.eval(input): {e5y.eval(input)}")
# e5: ((x*x)+((x*y)+(y*(y*y))))
# e5x: (((1*x)+(x*1))+(((1*y)+(x*0))+((0*(y*y))+(y*((0*y)+(y*0))))))
# e5y: (((0*x)+(x*0))+(((0*y)+(x*1))+((1*(y*y))+(y*((1*y)+(y*1))))))
# e5.eval(input): 3
# e5x.eval(input): 3
# e5y.eval(input): 4

Now we can map input parameters to outputs to find local and absolute minimums and maximums in functions. But we could probably spare the number of parenthesis ... but we'll get to that next time.

These types of software patterns and data aggregations are some of the tools used in machine learning and neural nets in order to make recommendations on all kinds of things like image recognition, structure identification, natural language processing, optical character recognition, and more!