Figure The two classes are represented by 0 and 1, respectively, and the group of 10 outliers at the point (-4, -4) are shown at the bottom left of the figure. The data was generated using 6 1 . 2 = -0.7. An additional outlier at (-20, -20) is not shown. The positions of the outliers have been randomly perturbed for greater visibility. The decision boundaries for several of the models are shown.
an N by
matrix of leave-one-out parameters, that is a matrix with entries wp calculated with the nth observation omitted;
a vector of the N studentized residuals (also called the jackknzfed residuals)
Jvar(t, where z*(z,) is
2 ; ,
calculated without the point I,, and then evaluated at
the N diagonal values of the projection matrix for the fitted model.
The diagonal elements of the projection matrix = X ( X 7 X ) - X T have a direct interpretation az*/at, the rate of change of the fitted values for a change in would be a continuous variable in this case). In the case of a linear model the leave-one-out estimates can be calculated without having to refit the model. For generalized linear models a common practice (Hastie and Pregibon, 1992) is to take one step from the fitted model. That is, one observation is omitted and one step of the iteratively re-weighted least squares algorithm (Section A.4, p. 249)
is taken from the fitted model. This gives computationally cheap but effective estimates of both the leave-one-out parameter and the jackknifed residuals". These outputs can be examined directly or they can be used to calculate Cook's distance diagnostic (Cook and Weisberg, 1982) showing the influence of deleting an observation in the parameter vector in an appropriate metric. As these are now on the same scale they can be plotted and observations with a large Cook's distance can be investigated further. This procedure can be carried out for logistic regression, the robust logistic model and also for Cox and Pearce's model, which is a generalized linear model. However the simulated example shows a problem with these measures. As there are 10 outliers at the one coordinate position, omitting one of these points has a negligible effect on the parameter estimates. Consequently the Cook's distance for the outlying points is quite small. In order to detect the 10 points as outliers, subsets of points (with up to 10 points in the subset) would have to be deleted a t a time. In the general case an all-subsets deletion algorithm would be required. Deleting groups of outliers is computationally intensive for linear and generalized linear models and the problem is compounded for iteratively fitted non-linear models such as the MLP. The leave-one-out parameter estimates and residuals could be calculated but this does not seem a t all practical in the case of the MLP the model has to be refitted for each omitted observation. We can linearize the model a t the fitted values and calculate the N x matrix where
which is a linearized estimate of However, consider the example shown in Figure 10.7. At the fitted model, the derivative for the outlying point (50) will be relatively small. However the point has clearly been quite influential in determining the final fit. Hence tracking the derivatives through time may provide a better diagnostic tool than the matrix This will give a series of N curves for each parameter which can be plotted on the same axis. It is possible that highly influential points may be seen to increase in 6'p(z,)/dw, beyond the value for the major body of the data. Figure 10.8 shows a plot for the model from Figure 10.7. It can be clearly seen that during the fitting process the point labeled 50 has had a great influence on a potential the parameter This is a point that should be investigated further outlier .
