A Pitch in Time: An Artificial Neural Network of Melodic Expectancy

Kate Stevens, University of Western Sydney Sue Becker and Laurel Trainor, McMaster University

A contemporary theory of melody cognition -- the implication-realization (IR) theory (Narmour, 1990, 1991, 1992)-- draws on and defines Gestalt principles of organisation such as proximity, similarity and closure as being the "given code" for perception of melodic patterns. Interestingly, a range of experimental evidence is emerging that suggests that human listeners are indeed sensitive to the principles of IR theory (Cuddy & Lunney, 1995; Schellenberg, 1996; Thompson, Cuddy & Plaus, 1997). One assumption of Narmour's model is that many expectancies in melody cognition can be explained, to a large degree, by structures that arise from local, tone-to-tone transitions. Artificial neural networks in many instances, operate according to comparable bottom-up processes. This study, for the first time, examines aspects of Narmour's theory such as the development of expectancies and the representation of pitch within a neural network framework. If artificial neural networks are to be regarded as valid and informative models of human perception and cognition then they need to reflect principles espoused in contemporary theory and their results need to be compared directly with human performance.

We hypothesized that an artificial neural network model of melody cognition will exhibit signs of the IR principles during the course of learning and/or whilst performing a melody prediction task. Specifically, if the network is a valid model of melody cognition then, after exposure to examples of Western tonal melodies, the network will construct a set of connection strengths and hidden unit activations that permit: a) prediction of the next note in familiar melodies; and b) prediction of the note following an implicative interval that conform with Narmour's principles. Two feed-forward networks were constructed and exposed to identical sets of training and test patterns. The first network was designed to encourage the construction of an interval code as a result of exposure to examples of Western tonal melodies. An interval code is one that uses the pitch distance between each two successive tones and ignores the actual pitch values. With such a code and the computational power afforded by inclusion of two layers of hidden units, it was expected that performance of the multi-layer back-propagation network would surpass that of a single layer network as measured by melody prediction and tests on musical intervals.

The two models provide an existence proof that principles central to Narmour's model of bottom-up melodic expectancy can be learned by exposure to a set of Western tonal melodies. Although these relatively simple networks performed well, design modifications are needed to enable inclusion of duration and amplitude features. The existence of exceptions or irregular occurrences in music provides the motivation to model expectancy using adaptive mixtures of local experts (Jacobs, Jordan, Nowlan & Hinton, 1991). It would also be possible to build proximal pitch relations into a network by coarse coding the input pitch units and to measure the degree to which melody and interval prediction improves. Subsequent models that use networks to examine the way in which pitch, interval, scale step and key are acquired from the auditory environment and represented in memory will also be discussed.

Back to Sue Becker's home page