Distribution Modeling in .NET

Distribution Modeling
In a generic biometric system, S biometric features are measured to create a biometric feature vector x (S 1) for each person. For person p, we have Np features samples, while we have Nq samples for the population. For convenience of notation, we sort p s measurements to be the rst grouping of the population. De ning x as an instance of random variable X, we calculate the population feature mean q q = E [X] =
1 Nq
xi ,
where the feature mean of person p, p , is de ned analogously, replacing q by p. The population feature covariance q is
1 = E (X q ) (X q ) = q Nq 1
(xi q )t (xi q ).
The individuals feature covariance, p , is again de ned analogously. One important general dif culty with direct information-theoretic measures is that of data availability. Distributions are dif cult to estimate accurately, especially at the tails; and yet log2 (p(x)/q(x)) will give large absolute values for small p(x) or q(x). Instead, it is typical to t data to a model with a small number of parameters. The Gaussian distribution is the most common model; it is often a good re ection of the real-world distributions and is analytically convenient in entropy integrals. Another important property of the Gaussian is that it gives the maximum entropy for a given standard deviation, allowing such models to be used to give an upper bound to entropy values. Based on the Gaussian model, which seems to be the simplest and appropriate for p and q, we write p(x) = q(x) = 1 |2 1 |2
exp exp
1 x p 2 1 x q 2
x p x q ,
(23.4) (23.5)
from which we can calculate D(p q). D(p q) = p(x) log2 p(x) log2 q(x) dx
p | ln |2 q| + 1 E p
(23.6) x q
t q 1
= k ln |2
x q (23.7)
Measuring Information Content in Biometric Features
= k ln
|2 |2
q| p|
+ trace (
+ T)
where T = ( p q )t ( p q ) and k = log2 e. This expression calculates the relative entropy in bits for Gaussian distributions p(x) and q(x). This expression corresponds to most of the desired requirements for a biometric feature information measure introduced in the previous section: 1. If person s feature distribution matches the population, p = q; this yields D(p q) = 0, as required. 2. As feature measurements improve, the covariance values, p , will decrease, resulting in a reduction in | p |, and an increase in D(p q). 3. If a person has feature values far from the population mean, T will be larger, resulting in a larger value of D(p q). 4. Combinations of uncorrelated feature vectors yield the sum of the individual D(p q) measures. Thus, for uncorrelated features s1 and s2 , where {s1 , s2 } represents concatenation of the feature vectors, D(p(s1 ) q(s1 )) + D(p(s2 ) q(s2 )) = D(p({s1 , s2 }) q({s1 , s2 })). 5. Addition of features uncorrelated to identity will not change D(p q). Such a feature will have an identical distribution in p and q. If U is the set of such / uncorrelated features, [ p ]ij = [ q ]ij = 0 for i or j U, and i = j, while [ p ]ii = [ q ]ii and [ q ]i = [ p ]i . Under these conditions, D(p q) will be identical to its value when excluding the features in U. One way to understand this criterion is that if the distributions for q and p differ for features in U, then those features can be used as a biometric to help identify a person. 6. Correlated features are less informative than uncorrelated ones. Such features will decrease the condition number (and thus the determinant) of both p and q . This will decrease the accuracy of the measure D(p q). In the extreme case of perfectly correlated features, p becomes singular with a zero determinant and D(p q) is unde ned. Thus, our measure is inadequate in this case. In the next section, we develop an algorithm to deal with this effect.