- Published on
Partial Derivatives & Recursive Descent
- Authors
- Name
- Anthony Corletti
- @anthonycorletti
Writing partial derivatives is a great way to understand some of the underlying features of machine learning and neural network libraries.
In this post I'll explain how partial derivates are a necessary building block in understanding machine learning and neural networks, and how to write some python code to help bring partial derivates and recursive descent to life!
Simply put, a partial derivative is a derivative where we hold some of the variables constant.
For example, take the following function:
It's derivative with respect to is:
But what about two variables? and ?
To find the partial derivative of the above function with respect to one variable, let's say , we treat as a constant, so we get the following:
And the partial derivative with respect to :
That's it. That's all there is to it.
Just remember to treat all other variables as if they are constants.
Now what's recursive descent? Well, imagine these functions being plotted in a graph in some dimensions where we feed inputs to our function and map those inputs to whatever the function returns.
Recursive descent means that we will keep feeding inputs that keep returning smaller and smaller outputs in order to find the lowest valley or valleys in the functions space across our partial derivatives.
In this way, a neural network can be the function with many parameters that can be adjusted over time in order to return desired or correct outcomes.
So that's enough about math, let's actually write some code that defines expressions and evaluates derivatives.
I'm using python 3.8 for this if you'd like to follow along.
To start, we'll write out all classes we'll be working with.
Namely, we'll need a base class, what I'm calling Expression (Expr
), a class
for variables (Var
), constants (Const
), and for operations – let's just focus on
adding and multiplying for now (Add
, Mult
).
class Expr: pass
class Add(Expr): pass
class Mult(Expr): pass
class Var(Expr): pass
class Const(Expr): pass
We'll be building our expressions with a binary tree structure, such that each add and multiply will have to evaluate a left and right sub-structue of expressions, variables, and/ or constants.
Now let's fill in our __init__
methods.
from numbers import Number
class Expr: pass
class Add(Expr): def __init__(self, left: Expr, right: Expr): self.left = left self.right = right
class Mult(Expr): def __init__(self, left: Expr, right: Expr): self.left = left self.right = right
class Var(Expr): def __init__(self, name: str): self.name = name
class Const(Expr): def __init__(self, val: Number): self.val = val
Hm, already noticing some re-factoring we can do. Let's combine Add
and Mult
into one class, a binary operator class (BinOp
).
class BinOp(Expr): def __init__(self, left: Expr, right: Expr, operator: str): self.left = left self.right = right self.operator = operator
Now we want to be able to view and evaluate our expressions so lets add our
__str__
and eval
methods and run a few examples.
from numbers import Numberfrom typing import Dict
class Expr: pass
class BinOp(Expr): def __init__(self, left: Expr, right: Expr, operator: str): self.left = left self.right = right self.operator = operator
def __str__(self) -> str: return f"({self.left}{self.operator}{self.right})"
def eval(self, input: Dict[str, Number]) -> Number: left_eval = self.left.eval(input) right_eval = self.right.eval(input) return eval(f"{left_eval}{self.operator}{right_eval}")
class Const(Expr): def __init__(self, val: Number): self.val = val
def __str__(self) -> str: return str(self.val)
def eval(self, input: Dict[str, Number]) -> Number: return self.val
class Var(Expr): def __init__(self, name: str): self.name = name
def __str__(self) -> str: return self.name
def eval(self, input: Dict[str, Number]) -> Number: return input[self.name]
And with running a few tests we should see this work ...
Note that I'm defining our input as a dict
so that we could have n
unique
parameters passed in as { "x": 1, "y": 1 ... }
# 3 * (y + x)e1 = BinOp(Const(3), BinOp(Var("y"), Var("x"), "+"), "*")print(f"e1: {e1}")
# (3 * y) + xe2 = BinOp(BinOp(Const(3), Var("y"), "*"), Var("x"), "+")print(f"e2: {e2}")
input = {"x": 2, "y": 4}
print(f"e1.eval(input): {e1.eval(input)}")print(f"e2.eval(input): {e2.eval(input)}")
# e1: (3*(y+x))# e2: ((3*y)+x)# e1.eval(input): 18# e2.eval(input): 14
Nice! So we can print and evaluate expressions. Now let's experiment with taking partial derivatives.
We'll need to define deriv
functions for each Expr
like so.
The final state of code should look like the following.
from numbers import Numberfrom typing import Dict
class Expr: pass
class BinOp(Expr): def __init__(self, left: Expr, right: Expr, operator: str): self.left = left self.right = right self.operator = operator
def __str__(self) -> str: return f"({self.left}{self.operator}{self.right})"
def eval(self, input: Dict[str, Number]) -> Number: left_eval = self.left.eval(input) right_eval = self.right.eval(input) return eval(f"{left_eval}{self.operator}{right_eval}")
def deriv(self, var: str) -> Expr: if self.operator == "+": return BinOp(self.left.deriv(var), self.right.deriv(var), "+") elif self.operator == "*": return BinOp(BinOp(self.left.deriv(var), self.right, "*"), BinOp(self.left, self.right.deriv(var), "*"), "+")
class Const(Expr): def __init__(self, val: Number): self.val = val
def __str__(self) -> str: return str(self.val)
def eval(self, input: Dict[str, Number]) -> Number: return self.val
def deriv(self, var: str) -> Expr: return Const(0)
class Var(Expr): def __init__(self, name: str): self.name = name
def __str__(self) -> str: return self.name
def eval(self, input: Dict[str, Number]) -> Number: return input[self.name]
def deriv(self, var: str) -> Expr: if self.name == var: return Const(1) else: return Const(0)
And for our examples ...
# x2 + xy + y3a = BinOp(Var("x"), Var("x"), "*")b = BinOp(Var("x"), Var("y"), "*")c = BinOp(Var("y"), BinOp(Var("y"), Var("y"), "*"), "*")
e5 = BinOp(a, BinOp(b, c, "+"), "+")e5x = e5.deriv("x")e5y = e5.deriv("y")
print(f"e5: {e5}")print(f"e5x: {e5x}")print(f"e5y: {e5y}")
input = {"x": 1, "y": 1}print(f"e5.eval(input): {e5.eval(input)}")print(f"e5x.eval(input): {e5x.eval(input)}")print(f"e5y.eval(input): {e5y.eval(input)}")
# e5: ((x*x)+((x*y)+(y*(y*y))))# e5x: (((1*x)+(x*1))+(((1*y)+(x*0))+((0*(y*y))+(y*((0*y)+(y*0))))))# e5y: (((0*x)+(x*0))+(((0*y)+(x*1))+((1*(y*y))+(y*((1*y)+(y*1))))))# e5.eval(input): 3# e5x.eval(input): 3# e5y.eval(input): 4
Now we can map input parameters to outputs to find local and absolute minimums and maximums in functions. But we could probably spare the number of parenthesis ... but we'll get to that next time.
These types of software patterns and data aggregations are some of the tools used in machine learning and neural nets in order to make recommendations on all kinds of things like image recognition, structure identification, natural language processing, optical character recognition, and more!