A Grand Unified Theory of Music

Two fundamental concepts of music are pitch and time. Time defines rhythm, and pitch defines melody and harmony.

If music is a single unified phenomenon, i.e. there is just one thing that is music, then our understanding of what music is must somehow unite these two disparate concepts.

The super-stimulus theory of music can be considered a "unified theory" of music, which identifies a number of analogies between the perception of pitch and the perception of time intervals. These analogies relate to:

A symmetry is a set of transformations (mathematically a group), under which certain features of a structure are preserved. The major symmetry that applies to musical pitch is invariance under pitch translation, which in more ordinary musical language corresponds to a "change of key" or "transposition". The musical quality of music is not significantly altered by transposition (provided the size of the transposition is not too large). The major symmetry applying to musical time is invariance under time scaling, which corresponds to playing music slower or faster. This symmetry is not as exact as pitch translation invariance, because for most music there is an optimal tempo for playing it, but we are still capable of recognising the similarity of a rhythm played at one tempo to the same rhythm played at a different tempo, which implies that there is some aspect of our perception of rhythm which is essentially unchanged by a scaling of time.

A significant difference between pitch and time is that there is an additional symmetry applying to pitch, which is that of octave translation invariance. This has the effect of making the pitch scale a circular scale rather than a purely linear scale. This difference (as compared to time, which has no such circularity) may explain the difference in the exactnesses of pitch translation invariance and time scaling invariance.

A calibratable relationship is a relationship which is preserved under the relevant symmetry, and which very likely provides the mechanism by which the brain achieves that symmetry; in other words the brain calibrates perception against the relationship. In the case of pitch the relationship is that of an interval between two pitch values being consonant, where a consonant interval corresponds to a simple fractional ratio between two frequencies. In the case of time the calibratable relationship consists of the ratio between two durations being one of 1:2 or 1:3.

The third analogy appears when we develop plausible models of cortical maps that respond to musical scales and time signatures.

A musical scale consists of a set of pitch values (modulo octaves), which occur in an item of music, such that pitch values not in the scale do not occur. Consider a cortical map consisting of neurons which respond to pitch values, with some degree of persistence, so as to remember which pitch values have occurred recently. When responding to music, the activity of neurons in this map will be restricted to regions containing those neurons representing pitch values from the scale. Depending on the particular scale, the activity pattern might consist of five or seven active regions.

Now consider a cortical map containing neurons which respond to regular beat. What will the reponse of this map be to music? Suppose, for example, that a tune has a time signature of 4/4, with the shortest notes being 16th notes. We can identify five different regular beats underlying this time signature: once per bar, twice per bar, four times per bar, eight times per bar and sixteen times per bar. The pattern of activity of neurons in the map will therefore contain five active regions.

We should also consider the response of these same maps to speech. Speech has melody and rhythm, just like music has melody and rhythm, but speech lacks those features of melody and rhythm which cause the patterns of activity in the maps to have constant active and inactive regions.

There is more to music than just scales and regular beat. But the analogy found in the responses of these two (hypothetical) cortical maps to music and speech suggests a general explanation for all aspects of music:

For each aspect of music, there is some cortical map which responds to that aspect with a constant pattern of active and inactive regions, and the same cortical map responds to speech, but the corresponding activity patterns are not constant.