justification. In simulated examples, f was found to improve the accuracy of the p-value. See Draper (1988, Section 4) for a review of results on the estimation of 'T. 6.7d. Test statistic FNP ' regarded as a random variable, has approximately the same distribution as G /( p - q), where G has a X 2 distribution with p - q degrees of freedom, provided the null hypothesis f3 q + I = ... = f3 p = o is true and the sample size n is large. (See Hettmansperger, 1984, Theorem 5.3.1.) To be more mathematically precise, for any fixed number c, the limit of Prob[ FNP Z c] approaches Prob[ G /( p - q) Z c] as n becomes very large. There is no theoretically justified connection between FNP and the F distribution, but nevertheless in a number of examples having smaller sample sizes, statistical researchers have found that a more accurate approximation of the p-value is obtained if the F distribution with p - q and n - p - 1 degrees of freedom, rather than G /( p - q), is used to approximate the distribution of FNP (see Hettmansperger, 1984, p. 266). 6.7e. The matrix W is defined to be the p X P matrix obtained from (X' X) - I by omitting the first row and first column. Another way to obtain W is as W = (X~X)-l, where Xc is the matrix of centered explanatory variables. The value x ij of the jth explanatory variable for the ith unit is centered by subtracting the average i j of the jth explanatory variable for all n units. Express X in partitioned form as X = (1, Z), where 1 is a column of l's and Z is the matrix of explanatory variables. Then
X'X = [ 1'1
[n Z'I
In the case of simple regression, Z is a vector and X' X is a 2 X 2 matrix, which is easy to invert. The entry that is left after omitting the first row and column of (X'X)-I is n/[n(Z'Z) - (Z'I)(I'Z)] = [Z'Z - (l/n)Z'll'Z]-I. It can be shown that this formula also holds for mUltiple regression, that is, W = [Z'Z - (1/n)Z'll'Z]-I. The matrix Xc of centered variables is obtained by subtracting from each entry of Z the average of the entries in that column. In matrix terms, Xc = Z - (l/n)Il'Z. (The row vector l'Z contains the p sums of the columns of Z, the row vector (l/n)I'Z contains the p averages of the columns of Z, and (l/n)ll'Z repeats the row of averages n times.) A little matrix algebra shows that X~Xc = Z'Z - (l/n)Z'll'Z. 6.7f. We have defined two estimates of f3o Let d i = Yi - (~l Xii + ... +~PXiP) and define ~hd) to be the median of the d/s and ~hdd) to be
the median of the pailWise averages (d i + d)/2. The second estimate is appropriate only if the error distribution is approximately symmetric. If symmetry holds, then, at least when the sample size is large, ~bdd) has a smaller SO than ~~)d) if and only if the parameter T satisfies T < 1/(28), where 8 is the probability density of the error distribution at O. (See Hettmansperger, 1984, pp. 250-251.) The quantity 1/(20) appears in LAO regression in a role similar to that of T in non parametric regression. (In 4 the symbol T is used to denote the quantity 1/(28).) 6.7g. Let us show that T /a = 1.023 when the error population has a normal distribution. If the distribution of a randomly selected error is normal with standard deviation a, then the distribution of the difference of two randomly selected errors is normal with standard deviation Ii a. Hence the probability density function for the error differences is
[( t) =
