A Biologically Inspired Model
Invariant aspects of faces Early processing of facial features Deformation of the face f DYNAMIC CUES
Facial expression analysis
Recognition of identity
Figure 8.3. Depiction of the different processes of the model presented in this chapter. The modules dedicated to the recognition of identity and expression are dissociated, although they both obtain information from the common process DF (deformation of the face) and from the processes that compute static cues. This is key to explain the psychophysical date described in the past.
Recognition of Identity
Consider two different images, I1 and I2 , both of n pixels. We can rede ne the images as vectors taking values in an n-dimensional space. We shall denote this as V1 and V2 , with Vi Rn . The advantage of doing this is that it allows comparisons of the images by means of vector operations such as subtraction V1 V2 , (8.1)
where denotes the L2 norm (i.e., Euclidean distance). In this de nition stated here, we assume that all faces have been aligned (with respect to the main facial features) in such a way that the eyes, mouths, noses, and so on, of each of the images are at roughly the same pixel coordinates (e.g., see references 53 and 54). The approach de ned above, in Eq. (8.1), has proven to perform well when frontal face images with similar facial expressions are compared to each other. However, this comparison becomes unstable when matching face images bearing different facial expressions [52]; hence pixels can now carry information of different features. The incorporation of the DF process in our model allows us to represent face processing as f 1 (V1 V2 ) , (8.2)
where f is a function proportional to the motion of each pixel that is, the movement representing the facial expression of the test image. Intuitively, f is a function that keeps correspondences between the pixels of the rst and second images. Equation (8.2) can be interpreted as follows: Pixels (or local areas) that have been deformed largely due to local musculature activity will have a low weight, whereas pixels that are less affected by those changes will gain importance. We can formally de ne f 1
8.4 Model of Expression-Variant Processing
as taking values linearly inverse to those of f , that is, MAXF Fi , (8.3)
where F is the motion ow (i.e., motion between two images), Fi is the motion vector at the ith pixel, and MAXF = max i Fi (the magnitude of the largest motion vector in the image). Thus the value of f corresponds to the outcome of the DF process. Note that f de nes the face deformation (motion) between two images and, therefore, can also be used to estimate the facial expression of a new incoming face image. As mentioned earlier, experimental data support this belief.
Motion Estimation
Visual motion between two images can be expressed mathematically by local deformations that occur in small intervals of time, t, as I(x, y, t) = I(x + u t, y + v t, t + t), (8.4)
where I(x, y, t) is the image value at point (x, y) at time t, (u, v) are the horizontal and the vertical image velocities at (x, y), and t is considered to be small [55]. We note that in our model we have f = (u, v). If we assume that the motion eld (i.e., the pixel correspondences between the two images) is small at each pixel location, the motion estimator can be represented by the rst-order Taylor series expansion as ED = Ix u + Iy v + It dxdy, (8.5)
where (Ix , Iy ) and It are the spatial and time derivatives of the image, and is an estimator. To resolve the above equation, it is necessary to add an additional constraint. The most common one is the spatial coherence constraint [55], which embodies the assumption that neighboring pixels in an image are likely to belong to the same surface, and therefore a smoothness in the ow is expected. The rst-order model of this second constraint is given by ES = ( (u, v)) dxdy, (8.6)
where represents the gradient. Visual motion is determined by minimizing the regularization problem E = ED + ES . (8.7)
Although the objective function E is nonlinear (and a direct solution does not exist for minimizing it), a convex approximation can be obtained [56]. The global minimum can then be determined iteratively. This procedure is most effective when the object displacements between consecutive images are small. When object displacements are large, a coarse-to- ne strategy
