-2yx + 2x 2 b. For general p the vector of partial derivatives is analogous: -2X'y + 2X'Xb. Setting this equal to 0, we get X'Xb = X'y, which coincides with (3.8).
The Turnip Green Data. the vector of estimates
For the data in Table 1.1 formula (3.8) yields
82.07 0.02276 -0.7783 0.1640
So the estimated regression equation is Y = 82.07 + 0.02276X. - 0.7783X 2 + 0.1640X3 (If you want to verify this, use a computer. Calculation of formula (3.8) using an ordinary hand calculator is feasible for p = 1 but is difficult for p = 3.) A more complete analysis of these data is outlined in Example 1 in Section 2.4. At the beginning of the analysis, model (3.6) is used, but it is found that the following model is better:
X4 = The estimated regression equation is 0.03367X] + 5.425X 2 - 0.5026X3 - 0.1209Xi-
119.6 -
The first test performed in a regression analysis is often a test of whether the explanatory variables actually contain any significant explanatory information. Let us perform such a test for the turnip green data. We want to compare the full model (3.9), containing all four explanatory variables, with the reduced model Y = f30 + e, containing no explanatory variables, to see whether there is a significant difference between these two models. In other words, we want to test f3] = f32 = f33 = f34 = O. In developing a test of f3 = 0 in Section 3.4 we started by noting that the value of ffi should tell us whether or not f3 = o. Similarly, it makes sense that the values of ffi], ffi2' ffi3' and ffi4 should tell us whether or not f3. = f32 = f33 = f34 = O. But rather than develop a test from the viewpoint of testing whether certain parameters are zero, we take the alternative viewpoint of comparing two models.
A Test Statistic. The suitability of a model can be judged by the size of the residuals. The smaller the residuals, the better the model fits the data. In the least-squares method, an overall measure of the size of the residuals is given by the sum of squares of the residuals. Let SSR denote the sum of squares of the residuals of a model. We can compare the full model with the reduced model by comparing SSR full with SSRreuuceu. Specifically, the test statistic we use for testing {3, = {32 = {3~ = {34 = 0 is F= SSRreuuceu - SSR full
where (j2 is an estimate of the variance u errors.
of the distribution of the random
Estimating (1"2. A natural estimate of the variance of the population of errors is the variance of the sample of estimated errors, that is, the residuals 2 i = Yi - ({3o + {3,x" + {32 x i2 + {3~xi1 + {34X/4) In order to make (j an unbiased estimate of u 2 we define
~ ~ ~ ~ ~
Le /2
n - 5
where the divisor n - 5 = 22 is used rather than n - 1 = 26. The subtraction of 5 from n corresponds to the fact that we must estimate five parameters {3o, {3" {32' {3~, and {34 in order to form the residuals e/. Note that Le i2 is the same as SSR full. Justification of Formula (3.10). The reduced model cannot possibly fit the data as well as the full model because it has fewer parameters and hence is less flexible. So it is always true that SSR reuuceu is larger than SSR full and the difference SSRreuuced - SSR full is positive. But when the reduced model is true, we expect this difference to be smaller than when the reduced model is false. It can be shown that when the reduced model is true, then the expected value of SSRreduced - SSR full is 4u 2 (The multiplier 4 in 4u 2 is the same as the number of parameters set equal to 0 in the hypothesis {3, = {32 = {33 = {34 = 0.) So when {3, = {32 = {33 = {34 = 0, we expect F to be close to 1. Having a preference for simpler models, we will decide {3, = {32 = {33 = {34 = 0 unless there is strong evidence against it as shown by a value of F that is much larger than I. The p-Value. The strength of the evidence against the null hypothesis {3, = {32 = {33 = {34 = 0 is measured by the largeness of F, which in turn is measured by the smallness of the p-value. The p-value of the test is the