AN UNSUPERVISED CLASSIFIER MODULATED BY TEMPORAL HISTORY OUTPERFORMS
RECURRENT BACK-PROPAGATION IN FACE RECOGNITION
The importance of contextual information in modulating neuronal response profiles is becoming increasingly apparent. Context may be defined as top-down information from descending pathways, temporal history from recurrent connections, or information from physically separate input channels. The ability of context to modulate activations and hence learning has obvious utility in increasing the capacity of the system to transmit information, e.g. by allowing it to take into account temporal structure. Further, improved generalization may be achieved by allowing high-level expectations e.g. of class labels to influence the development of lower-level feature detectors. I describe a model of unsupervised learning in which context, represented in a superficial layer of units, is combined with bottom-up information represented in hidden layer units via a maximum likelihood cost function. The model is an unsupervised version of the mixture of competing experts model  in which the context units perform the gating function and the experts fit a gated mixture of Gaussians model to their input. Clusters of one or more experts are modulated by a common gating unit; they thereby organize themselves into mutually supportive predictors of abstract contextual features. The model was tested in simulations with a set of real image sequences of four centered, gradually rotating faces, divided into a training set and test set by taking alternating views. It was compared on ten runs against a supervised, simple recurrent back-propagation network with essentially the same architecture: one layer of RBF units followed by one layer of recurrent softmax units. Although the unsupervised classifier did worse on the training set (the former averaged 88% while the latter always scored 100% correct), it outperformed the supervised model on the test set by a margin of 21%. The unsupervised model's markedly better ability to generalize stems from it's cost function; it favors hidden layer features which contribute to temporally coherent predictions at the output (gating) layer. Multiple views of a given object are therefore more likely to be detected by a given RBF unit in the unsupervised model, leading to considerably improved interpolation of novel views.
1. Jacobs, R.A., Jordan, M.I., Nowlan, S.J. and Hinton, G.E. (1991). Adaptive
mixtures of local experts. Neural Computation, 3(1):79-87.
Back to Sue Becker's home page