Thursday, May 2, 2013

All About Spherically Distributed Regression Errors

This post is based on a handout that I use for one of my courses, and it relates to the usual linear regression model,

                                  y = Xβ + ε

In our list of standard assumptions about the error term in this linear multiple regression model, we include one that incorporates both homoskedasticity and the absence of autocorrelation. That is, the individual values of the errors are assumed to be generated by a random process whose variance (σ2) is constant, and all possible distinct pairs of these values are uncorrelated. This implies that the full error vector, ε, has a scalar covariance matrix, σ2In

We refer to this overall situation as one in which the values of the error term follow a “Spherical Distribution”. Let's take a look at the origin of this terminology.

Good Old R-Squared!

My students are often horrified when I tell them, truthfully, that one of the last pieces of information that I look at when evaluating the results of an OLS regression, is the coefficient of determination (R2), or its "adjusted" counterpart. Fortunately, it doesn't take long to change their perspective!

After all, we all know that with time-series data, it's really easy to get a "high" R2 value, because of the trend components in the data. With cross-section data, really low R2 values are really common. For most of us, the signs, magnitudes, and significance of the estimated parameters are of primary interest. Then we worry about testing the assumptions underlying our analysis. R2 is at the bottom of the list of priorities.