# Linear Regression in Swift: Gradient Descent

## An iterative approach

In the previous post we introduced the concept of linear regression and some example data of car prices versus car age.

Let’s look at an iterative approach to finding the line of best fit for this data.

We start with some arbitrary values for the intercept and the slope. We work out what small changes we make to these values to move our line closer to the data points. Then we repeat this multiple times. Eventually our line will approach the optimum position.

First let’s set up our data structures. We will use two Swift arrays for the car age and the car price:

This is how we can represent our straight line:

Now for the code which will perform the iterations:

alpha is a factor that determines how much closer we move to the correct solution with each iteration. If this factor is too large then our program will not converge on the correct solution.

The program loops through each data point (each car age and car price). For each data point it adjusts the intercept and the slope to bring them closer to the correct values. The equations used in the code to adjust the intercept and the slope are based on moving in the direction of the maximal reduction of these variables. This is a gradient descent.

We want to minimse the square of the distance between the line and the points. Let’s define a function J which represents this distance – for simplicity we consider only one point here:

$J \propto ((slope.carAge+intercept) - carPrice))^2$

In order to move in the direction of maximal reduction, we take the partial derivative of this function with respect to the slope:

$\frac{\partial J}{\partial (slope)} \propto (slope.carAge+intercept) - carPrice).carAge$

And similarly for the intercept:

$\frac{\partial J}{\partial (intercept)} \propto (slope.carAge+intercept) - carPrice)$

We multiply these derivatives by our factor alpha and then use them to adjust the values of slope and intercept on each iteration.

Looking at the code, it intuitively makes sense – the larger the difference between the current predicted car Price and the actual car price, and the larger the value of alpha, the greater the adjustments to the intercept and the slope.

It can take a lot of iterations to approach the ideal values. Let’s look at how the intercept and slope change as we increase the number of iterations:

Iterations Intercept Slope Predicted value of a 4 year old car
0 0 0 0
2000 4112 -113 3659
6000 8564 -764 5507
10000 10517 -1049 6318
14000 11374 -1175 6673
18000 11750 -1230 6829

Here is the same data shown as a graph. Each of the blue lines on the graph represents a row in the table above.

After 18,000 iterations it looks as if the line is getting closer to what we would expect (just by looking) to be the correct line of best fit. Also, each additional 2,000 iterations has less and less effect on the final result – the values of the intercept and the slope are converging on the correct values.

This works OK – but there’s also another way to work out the parameters for this line, without having to iterate.