Anthony Corletti
Published on

# Linear Regression with Go

Authors
•  Name
Anthony Corletti
@anthonycorletti

Linear regression is a common statistical data analysis technique.

It is used to determine the extent to which there is a linear relationship between a dependent variable and one or more independent variables, and is widely used in machine learning applications.

There are two types of linear regression, simple linear regression and multiple linear regression.

In simple linear regression a single independent variable is used to predict the value of a dependent variable. In multiple linear regression two or more independent variables are used to predict the value of a dependent variable. The difference between the two is the number of independent variables. In both cases there is only a single dependent variable.

In this post I'm going to introduce basic concepts on linear regression and build a simple linear regression example using go and spago.

### Simple regression

Simple linear regression uses traditional slope-intercept form, where $m$ and $b$ are the variables our algorithm will try to "learn" to produce the most accurate predictions.

$x$ represents our input data and $y$ represents our prediction.

$y = mx + b$

### Multivariable regression

A more complex, multi-variable linear equation might look like this, where $w$ represents the coefficients, or weights, our model will try to learn.

$f(x,y,z) = w_1x + w_2y, + w_3z$

The variables $x$, $y$, and $z$ represent attributes or distinct pieces of information we have about each observation.

$AppDownloads = w_1Facebook + w_2Instagram + w_3Google$

But no need to worry about multivar things now, we'll just be focusing on simple regressions for the rest of the post.

### Prediction, Cost functions, Gradient descent, Training & Evaluation

#### Prediction

Let's say our our prediction function outputs an estimate of sales given a company’s social media advertising spend and our current values for Weight and Bias.

$Sales = Weight \times SocialMedia + Bias$

$Weight$: the coefficient for the $SocialMedia$ independent variable. In machine learning we call coefficients weights.

$SocialMedia$: the independent variable. In machine learning we call these variables features.

$Bias$: the intercept where our line intercepts the y-axis. In machine learning we can call intercepts bias. Bias offsets all predictions that we make.

Our algorithm will try to learn the correct values for $Weight$ and $Bias$. By the end of our training, our equation will approximate the line of best fit.

#### Cost function

While predictions are nice, we don't really need it for this example, we need a cost function to help optimize our weights.

For our example we'll use Mean Squared Error (MSE). MSE measures the average squared difference between an observation’s actual and predicted values. The output is a single number representing the cost, or score, associated with our current set of weights. Our goal is to minimize MSE to improve the accuracy of our model.

To minimize MSE we use Gradient Descent to calculate the gradient of our cost function. I wrote a small post on Recursive Descent that shows an example of how to build a set of classes for building and evaluating partial derivatives which help in building gradient descent applications. spago will conveniently do this for us!

To solve for the gradient, we iterate through our data points using our new weight and bias values and take the average of the partial derivatives along the way.

The resulting gradient tells us the slope of our cost function at our current position (i.e. weight and bias) and the direction we should update to reduce our cost function (we move in the direction opposite the gradient). The size of our update is controlled by the learning rate.

#### Training & Evaluation

Training a model is the process of iteratively improving your prediction equation by looping through the dataset multiple times, each time updating the weight and bias values in the direction indicated by the slope of the cost function (gradient).

Training is complete when we reach an acceptable error threshold, or when subsequent training iterations fail to reduce our cost.

Before training we need to initialize our weights (set default values), set our hyperparameters (learning rate and number of iterations), and prepare to log our progress over each iteration.

If our model is working, we should see our cost decrease after every iteration.

### An example with spago

Let's begin by building a simple regression model with spago. I'm using go1.14.4 at the time of writing this post.

This example will only predict a single weight, not bias.

Create a dir and initialize your go modules.

mkdir spagolr && cd spagolrgo mod init example.com/spagolr

Install spago.

go get -u github.com/nlpodyssey/spago

Scaffold out the application.

// main.go
package main
import (	"fmt"	"github.com/nlpodyssey/spago/pkg/mat"	"github.com/nlpodyssey/spago/pkg/ml/ag"	"github.com/nlpodyssey/spago/pkg/ml/nn")
var (	_ nn.Model     = &LinearRegression{}	_ nn.Processor = &Processor{})
// LinearRegression – we'll use this struct to store the weights of our// function using spago's nn.Paramtype LinearRegression struct {	Weights *nn.Param type:"weights"}
// Processor – this will run our regressiontype Processor struct {	nn.BaseProcessor	w ag.Node}
func main() {	fmt.Println("🚀")}

Now we'll write functions to do the following:

1. Create a LinearRegression
2. Create a new Processor to process the model
3. Run the processor
func NewLinearRegression(in, out int) *LinearRegression {	return &LinearRegression{		Weights: nn.NewParam(mat.NewEmptyDense(out, in)),	}}
func (m *LinearRegression) NewProc(g *ag.Graph) nn.Processor {	return &Processor{		BaseProcessor: nn.BaseProcessor{			Model:             m,			Mode:              nn.Training,			Graph:             g,			FullSeqProcessing: false,		},		w: g.NewWrap(m.Weights),	}}
func (p *Processor) Forward(xs ...ag.Node) []ag.Node {	ys := make([]ag.Node, len(xs))	for i, x := range xs {		ys[i] = p.Graph.Mul(p.w, x)	}	return ys}

Now for the fun part, writing the main function to actually run and train our model. Let's say for something super simple like $y=2x+1$

Your final code should look like the following

package main
import (	"fmt"	"github.com/nlpodyssey/spago/pkg/mat"	"github.com/nlpodyssey/spago/pkg/mat/rand"	"github.com/nlpodyssey/spago/pkg/ml/ag"	"github.com/nlpodyssey/spago/pkg/ml/initializers"	"github.com/nlpodyssey/spago/pkg/ml/losses"	"github.com/nlpodyssey/spago/pkg/ml/nn"	"github.com/nlpodyssey/spago/pkg/ml/optimizers/gd"	"github.com/nlpodyssey/spago/pkg/ml/optimizers/gd/sgd")
var (	_ nn.Model     = &LinearRegression{}	_ nn.Processor = &Processor{})
// LinearRegression – we'll use this struct to store the weights of our// function using spago's nn.Paramtype LinearRegression struct {	Weights *nn.Param type:"weights"}
// Processor – this will run our regressiontype Processor struct {	nn.BaseProcessor	w ag.Node}
// NewLinearRegression ...func NewLinearRegression(in, out int) *LinearRegression {	return &LinearRegression{		Weights: nn.NewParam(mat.NewEmptyDense(out, in)),	}}
// NewProc ...func (m *LinearRegression) NewProc(g *ag.Graph) nn.Processor {	return &Processor{		BaseProcessor: nn.BaseProcessor{			Model:             m,			Mode:              nn.Training,			Graph:             g,			FullSeqProcessing: false,		},		w: g.NewWrap(m.Weights),	}}
// Forward ...func (p *Processor) Forward(xs ...ag.Node) []ag.Node {	ys := make([]ag.Node, len(xs))	for i, x := range xs {		ys[i] = p.Graph.Mul(p.w, x)	}	return ys}
func main() {	inputDim := 1  // takes variable 'x'	outputDim := 1 // takes variable 'y'	learningRate := 0.0001	epochs := 100	seed := 8686 // seed for random params initialization	model := NewLinearRegression(inputDim, outputDim)	criterion := losses.MSESeq                                                // mean squared error	updater := sgd.New(sgd.NewConfig(learningRate, 0.0, false))               // stochastic gradient descent (no momentum etc.)	optimizer := gd.NewOptimizer(updater, nn.NewDefaultParamsIterator(model)) // link the model to the optimizer
// Random params initialization	rndGen := rand.NewLockedRand(uint64(seed))	nn.ForEachParam(model, func(param *nn.Param) {		if param.Type() == nn.Weights {			initializers.XavierUniform(param.Value(), 1.0, rndGen)		}	})
// Create dummy data for training with a very basic linear equation i.e., y=2x+1.	// Here, ‘x’ is the independent variable and y is the dependent variable.	n := 110	xValues := make([]float64, n)	yValues := make([]float64, n)	for i := 0; i < n; i++ {		xValues[i] = float64(i)		yValues[i] = 2*xValues[i] + 1	}
for epoch := 0; epoch < epochs; epoch++ {		// get a new computational graph (cg)		cg := ag.NewGraph()
// Converting x and y values to graph nodes		var inputs []ag.Node		var labels []ag.Node		for i := 0; i < n; i++ {			inputs = append(inputs, cg.NewScalar(xValues[i]))			labels = append(labels, cg.NewScalar(yValues[i]))		}
// get output (i.e. prediction) from the model, given the inputs		outputs := model.NewProc(cg).Forward(inputs...)
// get loss for the predicted output		loss := criterion(cg, outputs, labels, true) // true = reduce mean
// get gradients w.r.t to parameters		cg.Backward(loss)
// update parameters		optimizer.Optimize()
fmt.Printf("epoch %d, loss %.6f\n", epoch, loss.ScalarValue())	}
fmt.Printf("Learned coefficient: %.6f\n", model.Weights.ScalarValue())}

Now let's run our program which will train our model to learn the coefficient of the function we used to generate our test points.

$go run main.goepoch 0, loss 12344.992712epoch 1, loss 4476.185291epoch 2, loss 1623.076722epoch 3, loss 588.583429epoch 4, loss 213.492069epoch 5, loss 77.489710epoch 6, loss 28.177345epoch 7, loss 10.297441epoch 8, loss 3.814462epoch 9, loss 1.463834epoch 10, loss 0.611532epoch 11, loss 0.302500epoch 12, loss 0.190450epoch 13, loss 0.149823epoch 14, loss 0.135092epoch 15, loss 0.129751...epoch 95, loss 0.126712epoch 96, loss 0.126712epoch 97, loss 0.126712epoch 98, loss 0.126712epoch 99, loss 0.126712Learned coefficient: 2.013699 # wahoo! We got pretty close! If we wanted to solve for something like say $y = 2x + 50$. We would have to tune our hyperparameters. For example, a learning rate of 0.00000001, 1000 epochs, and 1100 test points yields: $ go run main.goepoch 0, loss 1305405.200653epoch 1, loss 1294912.981077epoch 2, loss 1284505.113139epoch 3, loss 1274180.918700epoch 4, loss 1263939.725069epoch 5, loss 1253780.864968epoch 6, loss 1243703.676479epoch 7, loss 1233707.503008epoch 8, loss 1223791.693240epoch 9, loss 1213955.601094epoch 10, loss 1204198.585685...epoch 990, loss 754.597889epoch 991, loss 751.047094epoch 992, loss 747.524845epoch 993, loss 744.030914epoch 994, loss 740.565072epoch 995, loss 737.127093epoch 996, loss 733.716753epoch 997, loss 730.333831epoch 998, loss 726.978106epoch 999, loss 723.649359Learned coefficient: 2.023235 # also really close!

For ml applications of linear regression, it's hard to ignore the feature rich ecosystem that python provides, but it's great to know that spago is available for teams, primarily writing backend applications in go, to build ml into their system without having to context switch between languages.

This is a big post, and if you were following along and something didn't go quite right, let me know; a twitter dm works just fine!

Sources: