## An iterative approach

In the previous post we introduced the concept of linear regression and some example data of car prices versus car age.

Let’s look at an iterative approach to finding the line of best fit for this data.

We start with some arbitrary values for the intercept and the slope. We work out what small changes we make to these values to move our line closer to the data points. Then we repeat this multiple times. Eventually our line will approach the optimum position.

First let’s set up our data structures. We will use two Swift arrays for the car age and the car price:

1 2 3 4 5 |
let carAge: [Double] = [10, 8, 3, 3, 2, 1] let carPrice: [Double] = [500, 400, 7000, 8500, 11000, 10500] |

This is how we can represent our straight line:

1 2 3 4 5 6 7 8 |
var intercept = 0.0 var slope = 0.0 func predictedCarPrice(carAge: Double) -> Double { return intercept + slope * carAge } |

Now for the code which will perform the iterations:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
let numberOfCarAdvertsWeSaw = carPrice.count - 1 let iterations = 2000 let alpha = 0.0001 for n in 1...iterations { for i in 0...numberOfCarAdvertsWeSaw { let difference = carPrice[i] - predictedCarPrice(carAge[i]) intercept += alpha * difference slope += alpha * difference * carAge[i] } } |

`alpha`

is a factor that determines how much closer we move to the correct solution with each iteration. If this factor is too large then our program will not converge on the correct solution.

The program loops through each data point (each car age and car price). For each data point it adjusts the intercept and the slope to bring them closer to the correct values. The equations used in the code to adjust the intercept and the slope are based on moving in the direction of the maximal reduction of these variables. This is a *gradient descent*.

We want to minimse the square of the distance between the line and the points. Let’s define a function J which represents this distance – for simplicity we consider only one point here:

In order to move in the direction of maximal reduction, we take the partial derivative of this function with respect to the slope:

And similarly for the intercept:

We multiply these derivatives by our factor alpha and then use them to adjust the values of slope and intercept on each iteration.

Looking at the code, it intuitively makes sense – the larger the difference between the current predicted car Price and the actual car price, and the larger the value of `alpha`

, the greater the adjustments to the intercept and the slope.

It can take a lot of iterations to approach the ideal values. Let’s look at how the intercept and slope change as we increase the number of iterations:

Iterations | Intercept | Slope | Predicted value of a 4 year old car |
---|---|---|---|

0 | 0 | 0 | 0 |

2000 | 4112 | -113 | 3659 |

6000 | 8564 | -764 | 5507 |

10000 | 10517 | -1049 | 6318 |

14000 | 11374 | -1175 | 6673 |

18000 | 11750 | -1230 | 6829 |

Here is the same data shown as a graph. Each of the blue lines on the graph represents a row in the table above.

After 18,000 iterations it looks as if the line is getting closer to what we would expect (just by looking) to be the correct line of best fit. Also, each additional 2,000 iterations has less and less effect on the final result – the values of the intercept and the slope are converging on the correct values.

This works OK – but there’s also another way to work out the parameters for this line, without having to iterate.

## 1 thought on “Linear Regression in Swift: Gradient Descent”