S.3h. As the initial estimates of a and {3 in the algorithm it is natural to use the least-squares estimates, because at each iteration the algorithm uses the method of least squares to calculate the improved estimates. However, convergence of the algorithm may be facilitated by using initial estimates that are more robust, such as LAD estimates or least-median-of-squares estimates (see Rousseeuw and Leroy, 1987). S.3i. In the algorithm presented here, the estimate is updated at every iteration. This is the procedure followed in the computer packages ROBSYS (Marazzi, 1987) and S-PLUS (Becker, Chambers, and Wilks, 1988). But ROSEPACK (Holland and Welsch, 1977) uses the LAD estimates as initial estimates of the regression coefficients, calculates u = 1.483MAD, and then keeps this same value of throughout the iterations that are required to converge to the M-estimates of the regression coefficients. By not updating the estimate of a, the amount of calculation is reduced. This is a definite advantage in simulation studies in which the estimation procedure must be repeated thousands of times. But if the initial estimates of the regression coefficients is poor, then an estimate of a based on them may well be poor, and this may adversely affect the robustness of the procedure (Shanno and Rocke, 1986, p. 88).
S.3j. A convenient criterion to determine when to stop iterating is the following. Stop when two successive pairs of estimates, say, (aO, bO) and (ai, b l ), satisfy the condition that both relative differences la l - aOI/ laol and Ib l - bOI/ Ibol are less than 10- 4 This guarantees that the two successive estimates "almost" agree to four significant digits (or to k significant digits if 10- 4 is replaced by lO- k ). To explain what is meant by "almost", round aO to four significant digits and regard these four digits as an integer mO. The integer mO is between 1000 and 10,000. Similarly, round a l to the same number of decimal places as aO (which implies that a l is also rounded to four significant digits if a I is close to an) and regard its digits as an integer mi. when we say that aO and a l agree to four significant digits, we mean that mO = m I. When we say that aO and a I "almost" agree to four significant digits, we mean that either mO and m l are equal or they differ by 1. However, even if two successive estimates agree to k significant digits, this does not guarantee that the M-estimate is accurate to k significant digits. See Note 5.3k. S.3k. Even though two successive estimates agree to four significant digits, this may not yield an M-estimate that is accurate to four significant digits. This problem can occur when the function (5.2), which we are trying to minimize, is rather "flat". To increase our confidence that our M-estimates are accurate to four significant digits, there are two precautions that can be taken. (1) We can iterate until both relative differences la l - aOI/ lao I and
data, we tried k = 8 and, after 16 iterations, obtained the same results, when rounded to four significant digits, as with k = 4. (2) We can try several different initial estimates and see if the algorithm converges to the same four significant digits. 5.4. The convergence of the M-estimation procedure for the shelf life data using the model Y = Q:' + e is very slow when we use the least-squares estimate aO = y as the initial estimate. To remedy slow convergence, a different initial estimate can be tried. We tried the LAD estimate. For this model, the LAD estimate of Q:' is simply the sample median, 3.45. The M-estimation procedure converges immediately when 3.45 is the initial estimate because all values of a between 3.4 + 0.04574 = 3.446 and 3.5 0.04574 = 3.454 minimize the function Lp( Yi - a), where P is defined as in (5.1) with k = 0.04574. 5.6a. An M-estimate can be defined in terms of the function P as in (5.4) or in terms of the derivative 1/1 = p'. The function 1/1 is often preferred, because it determines the shape of the influence function of the estimate; see Notes 5.6e and 5.6f. One can take ~ to be the value of b that minimizes LP(Yi - b'x) or, equivalently, the value of b for which the partial derivatives (JIJb)[p(Yi - b'x)l are zero for all j = 0,1, ... , p. The vector of partial derivatives is - LI/I(Yi - b'x)x i , and so ~ can be defined by the equation
