where @ is a bivariate Gaussian density with parameters p and C. Unfortunately the integral involves the form s e x p ( z 3 ) / ( 1 + exp(1 - s 2 ) ) d z which has no closed form solution. To avoid doing a numerical integral we report the error rate on an independent sample of 1000 observations from the same distribution. This gave an estimated error rate of 25.7 for the MLP model and 39.4 for Cox and Pearce's model (tolerance=0.001). This is quite an intriguing result. Compared to two robust modifications of logistic regression, MLP of size 2.1.1 does extremely well!
4Note that while these two models are the same, it is possible to get somewhat different fits the fitting algorithm and initial parameter values are different in the standard implementations.
Figure The two classes are represented by and 1, respectively, and the group of 10 outliers a t the point (-4, -4) are shown at the bottom left of the figure. The data was generated using 01.2 = 0.7. The positions of the outliers have been randomly perturbed for greater visibility. The decision boundaries for several of the models are shown. The number of errors for each model (not counting the ten additional points) is given in the legend.
For the simulation study the trials were repeated 10 times and the average error rates are reported in Tables 10.2 to 10.6. The optimal misclassification rates, using the true decision boundary of 5 1 = 0, are 35.18, 29.41, 26.98, 24.81, and 22.86 for the five values of considered. These rates are for the main body of the data, not including the aberrant points. These problems range from the quite difficult, on which both the robust and standard models fail, to the somewhat easier, on which both methods do reasonably well. For example, the data generated with ~ ~ 1 =2 -0.7 and outlying points a t (-4, -4) has an error rate of 41% on the . full data set with the true decision boundary, while classifying all the observations to class 1 only increases the error rate to A salient feature of Tables 10.3 to 10.6 is, moving from the bottom right of the table, the point at which the performance of the algorithm starts to exceed that of the logistic model (Table 10.2). We note several points arising from this study:
while Cox and Pearce s model has ICs that redescend (in the same manner as the P . l . l MLP model) it does not appear to be as robust. We have noted the dependence of the result on the tolerance level and in addition we note that two implementations of this model:
implementation (Francis et al., 1993);
an implementation based on a a modified logistic model with a tolerance of 0.001;
both give essentially the same answers5
These results appear to be less robust than the results given Cox and Pearce (1997) on t h e same problem. Currently we are at a loss to explain this anomaly.
the MLP model with a hidden layer performs very well compared to the other models and could be used when a robust logistic regression is required. However in the harder problems ( ~ 7 1 < 0) it fails to achieve an optimal error ,~ rate; the poor performance of the Huber estimator is due to the large number (10) of atypical points. With, for example, two or three atypical points a t (-4, -4) the Huber estimator gives a more satisfactory performance;
Table 10.2
The error rates for the logistic model (MLP of size P.l)
aberrant values at -4,-4 -4,-3.5 -3.5,-4 -3,-3 -3,- 1.5 -3,1.5 -1.5,-1.5 1.5,1.5
1,2=-0.7 55.11 54.98 54.94 54.75 53.11 42.36 51.09 36.14 36.37
1,2=-0.3 55.54 55.00 54.61 52.35 45.18 33.24 39.52 30.10 30.16
1,2=0 1 . 2 d . 3 48.39 51.31 46.84 50.24 46.73 51.23 41.45 46.84 34.71 38.26 27.77 30.44 31.03 34.96 25.33 27.46 27.60 - 25.38
1,2=0.7 44.04 40.29 43.33 37.54 30.78 25.07 28.69 23.92 23.99
