So if you’re not a statistics geek, like myself, then you need not read further.  But if you are, have you ever thought about why it’s OK to use ordinary least squares methods to fit functions to curves – for example polynomials?

One of the oldest and most revered methods for fitting statistical models to data is using the method of ordinary least squares.  By this method, one can fit a model to a set of data and arrive at the least squares estimates of parameters for the model under a set of assumptions (e.g., normal and constant error variances). This method allows one to estimate the parameters for the model that minimize the sum of squared deviations of the observed to predicted values, based on the model.

One of the major assumptions of the method of linear least squares is that the parameters have a linear relationship to the response variable.  This too seems like a reasonable assumption. This means that every independent variable in the model must have a linear relationships to the dependent variable.  In other words, the ordinary least squares method can only fit flat planar objects to data in multidimensional space.  For example, to fit this equation,

Y = a*X1 + b*X2 + c

the method can only fit a plane in 3 dimensions through the data.  This means that for any given value of X1 the predicted values of Y along the X2 dimension has to be a line, and for any value of X2 the predicted values of Y along the X1 dimension must be a line.  The plane can be warped, but if you pick a value on one independent dimension, the predicted values must form a straight line along the other independent dimension.

This all seems fine, until you get to the subject of polynomial regressions.  Polynomial regression is a method that allows one to fit curves to data, not lines.  So for example, consider these data. The blue dots in the figure are a bit of simulated data using the equation Y = 1.5*X2 + 2*X + 12 to which I added some random noise.     The blue line is the quadratic polynomial fitted to these data using ordinary least squares regression.  The fitted equation to these data is Y = 1.52*X2 + 2.03*X + 12.28, which minimizes the sum of the squared deviations (i.e., for each data point this deviation is the length of the black line from the data point to the fitted line that is parallel to the Y axis) of the data from the predicted values of the model.

The large green dot in the figure identifies the grand mean of the data set.  For those of you who know your statistics, you should recognize an apparent problem here.  The solution to a regression problem using ordinary least squares must pass through the grand mean of the data set!  In other words, the blue curve should pass through the green dot, but it doesn’t!  So polynomial regression would appear to violate two fundamental assumptions of ordinary least squares regression:

1. Solutions must be linear in one dimension of an independent variable.
2. The solution function must pass through the grand mean.

However, it turns out that these are not problems at all, but merely an optical illusion based on how we typically graph the data and the solution.  Note that the equation has two independent variables: X and X2.  The ordinary least squares actually solved a three dimensional problem!  X2 is simply a second independent dimension.  In other words, this polynomial model describes a plane in three dimensions!  If one considers the full graph of the problem, one realizes that our view of the graph in the figure above is simply an optical illusion because we are looking straight down the X2 axis.  In the next figure I show two panels.  The left panel is the same figure as above, and the right panel is the graph of the data in three dimensions, but rotated and oriented so that you can see the full 3D view. I have also illustrated the entire extent of the plane that has been fit to the data in the figure.  What we graph in the typical illustration of these data (i.e., the left panel) are only the curve on the plane corresponding to the points in the data set.  However, a 3D plane is being fit to the data (i.e., in the right panel).  Moreover, this plane does pass through the grand mean of the data set!  All is right in our ordinary least squares world – we just look at the world funny sometimes.

This is true whenever ordinary least squares is used to fit a polynomial model (or any model for that matter).  Here’s a similar result for some data where a cubic polynomial is the best fit model. I’ll leave it to you to ponder the rest.