minimizing the maximum of the misclassification probabilities; minimizing the sum of the misclassification probabilities with given prior probabilities. This amounts to evaluating the ROC curve a t specific points, see also Anderson (1984, $6.10.2). Cooke and Peake (2002) extend the procedure of Anderson and Bahadur to the case where the two classes are of unspecified distribution but their means and covariances are known and also give a simulation method for evaluating the whole ROC curve. Su and Liu (1993) show that the linear discriminant (p1 - p ~ ) ~ ( C 1 is optimum under the criterion of maximum area under the ROC curve.
Consider the 2 x 2 matrix
Write down the determinant of this matrix. Draw the parallelogram spanned by its row vectors, and calculate its area. For further in information about determinants see Apostol (1967) or other books on linear algebra. The determinant has a physical interpretation as the ndimensional volume of a matrix. However surprisingly, many texts gloss over the interpretation of determinants to concentrate on the calculation and rules for manipulation them. If the linear map f : R -+ R is represented by the x n matrix A , and h4 is a measurable subset of then the n-dimensional volume of is given by x volume(M)
Hence (3.2) (p. 20) is a ratio of two volumes, and we are seeking A such that this ratio is maximized. Using equations carry out a LDA. to 3.5, calculate and
for the iris data (p. 18) and
Assume that and .- N ( p 2 , C z ) . If we allocate an observation to a group on maximum posterior probability we have the discrimination rule
Substitute the two densities into this formula and derive the quadratic discrimination function. Make the simplifying assumption that C1 = C2 = and simplify the rule to: allocate to
when (PI - p ~ ) ~ C - - x [ 1/2(p1 otherwise.
+ p 2 ) ] > log
We consider the Leptograpsus example, discussed in Section 3.2, p. 22. The data set is available in the MASS library (Venables and Ripley, 2002).
COMPLEMENTS AND EXERCISES gs1 datamatrix barcode encodingon .net
%*% %*%
%*% %*%
- 1)
<UU <<-
%*% %*%
These, them
are the canonical variates. We then form the canonical scores and plot Figure 3.1, p. 24).
We can write a simple function to implement the direct method of solving for the linear discriminants. It differs from what we have done here only in that it allows for a tolerance to determine whether C, is singular. If any of the eigenvalues of C, are less than the tolerance then a generalized inverse is used. In any production code i t is a good idea to include such tests and if possible, work-arounds. >
Another option is to use the code from the "Modern Applied Statistics with S" (MASS) library (Venables and Ripley, 2002). The function from this library has a large number of interesting options including some for a robust LDA. We compare the output from 1 with and find that they are the same except for sign. The LDA is only unique up to sign. >
> > >
We consider a simulated problem designed t o show off the features of quadratic discriminant analysis. This is the problem with data from two classes lying within two sphere, one inside of the other.
3.9 In the space of the canonical variates, we can determine the Mahalanobis distance from each class mean for each point
x ( r ) where T is the number of canonical variates. This tail area on the distribution is referred to the typicality probability (McLachlan, 1992) It may be the case that observations, that are put into a class with a high posterior probability, are found to have a low typicality. Write a function to calculate the typicality and investigate the iris data set to see if all of the observations are typical of their class.