Tempo-based Regulation of Speech Processing
There exists in the human brain a regulatory system which manages the brain's processing of the auditory processing speech sounds to match the speed of the speech.
People can talk at different speeds – they can talk faster and they can talk slower.
But the meaning of what is said is almost entirely independent of the speed at which it is spoken.
So all parts of the brain involved in the processing of speech, starting with the analysis of individual speech sounds, moving on to the processing of the meaning of what is said, and ending with the processing of the emotional significance of that meaning, need to be globally adjusted to match the speed of the speech, ie the speech tempo.
In order to perform this global adjustment, the regulatory system needs to determine an estimate of the speech tempo.
It calculates this estimate by measuring the rate at which perceptual events occur in the auditory cortex.
Except it doesn't actually measure the rate at which perceptual events occur in the auditory cortex – if measures the rate at which different perceptual events occur.
Normal speech is sufficiently variable that the difference between counting events and counting different events doesn't matter.
But music consists of sounds which are contrived in a manner such that a large portion of perceptual events occurring in the auditory cortex are exact repetitions of previously occurring perceptual events.
This causes the regulatory system to severely underestimate the rate at which perceptual events are occurring, and therefore to under-estimate the speech tempo.
(We are not consciously aware of this under-estimated value of speech tempo, because the regulatory system is implemented by a subset of glial cells, and the "perception" of speech tempo by glial cells is quite separate from any normal neuronal perception of speech tempo.)
The consequences of underestimation of speech tempo
As a result of the underestimation of speech tempo, the regulatory system adjusts the processing of information in the auditory cortex and processing of information downstream, as if new information was coming in very slowly, even though it's actually coming in at a normal rate.
When processing a stream of information such as speech sounds, the brain has to retain a representation of the state of what it has processed so far, so that new information coming in can be processed in the context of that state, and the state itself can be updated.
On the one hand the state needs to persist long enough to still be there when new information comes in. On the other hand it cannot be persisted indefinitely. (Your brain is not like a computer that just sits there happily waiting ten hours for you to type the rest of a word that you started typing.)
The degree of persistence of neural activity is what needs to be adjusted to match the rate at which new information is coming in.
So if the measured rate is underestimated, the regulatory system will respond by raising the level of persistence to a level that is too high.
This raised level of persistence is what causes the intensification of musical emotion.
Different Types of Repetition That Occur in Music
Repetition of perceptual events does not necessarily mean exact repetition of complete sounds, although some of the repetition that occurs in music is just repetition of this kind, eg repeating a particular percussion sound.
In the context of the tempo-based regulatory system, "perceptual events" are the perceived values of specific features of sounds and also perceived values of relationships between different sounds.
So, for example, the pitch value of a musical note will be a repetition of the pitch value of a previous note, because notes in a musical item all come from a particular scale, and scales only have a finite number of notes in them.
(If we consider scales that repeat each octave, as is the case with all modern western popular music, then we can also consider the feature of "pitch class", which is even more repetitive, because then even pitch values separated by an octave count as being the "same".)
When you look at all the different types of perceptual features that occur in music, you start to see that there are many different kinds of repetition happening, including, but not limited to, repetitions of:
- Pitch values
- Frequencies of individual harmonics (in the case of notes separated by a harmonic interval)
- The intervals between different notes
- Melodic contours (where the up and down shape of a contour may repeat, even though two different occurences may start at a different locations in the scale)
- Progressive steps of any kind, ie if there is a sequence of three, then the step from item 2 to item 3 will be a repetition of the step from item 1 to item 2.
- Beat periods of notes (ie within the context of regular beats defined by time signatures)
- Lengths of notes
The Proto-Musical Language of Emotion and the Requirement for Change
If indeed the intensity of musical emotion is caused by the amount of exact repetition of perceptual events occurring in the auditory cortex of the listener, then we might ask - why is music not just composed of endless repetition of exactly the same sound?
The occurrence of exact repetition causes the intensity of musical emotion.
But something else actually has to cause the musical emotion to occur in the first place.
There is also the question of why a regulatory system driven by a detemined value of speech tempo should determine the intensity of response to music, where music is not actually the same thing as speech (even though songs are a thing where songs are a form of music with speech embedded in it).
A full answer to all these questions probably requires more research to resolve, but the most straightforward explanation I have so far is based on the presumed existence of a prehistoric language of emotion which pre-dated the evolution of modern word-based language.
This prehistoric language of emotion, or "proto-music" (ie the non-musical predecessor of music), was used by our ancestors to express emotions in certain situations.
It is a language that we no longer "speak", because it has been totally replaced by word-based spoken language.
But we retain a residual ability to understand that lost language, and that is what underlies our perception of musical emotion.
In effect music is an illusory form of proto-music, where the illusion is caused by the addition of features involving exact repetition to the original proto-music in a manner that intensifies the perception of the emotional meanings of the proto-music.
Also, although the regulatory system has evolved to aid in the processing of modern word-based language, there is enough overlap in the cortical areas involved that the same regulatory system affects the emotional comprehension of proto-music.
And the reason why music cannot be purely repetitive is that the proto-musical language of emotion used change to directly express emotion – that is, a speaker of proto-music uttered some sounds, and then uttered a variation of those sounds, and the form of the variation was what expressed the emotion.
The final result is that music, in order to get maximum effect, has to satisfy two partly contradictory requirements:
- The music must be constructed from individual elements which are mostly exactly repetitive.
- The music must contain changes strong enough to evoke emotion.
The Need for Precision of Musical Performance
If you've started learning how to perform music, you may have discovered how hard it is to get a good result.
To make music sound good, you have to perform it quite precisely.
It has to be in-tune, and on-time. (And that's not all, but even getting just those two things right can take quite a lot of practice.)
The reason why precision is required is the nature of the subterfuge.
The subterfuge requires exact repetitions of perceptual events as perceived by the listener.
If the exactness is not exact enough, then the subterfuge fails – the glial-based regulatory system is not fooled into underestimating the tempo, and therefore the music does not sound musical.
It's somewhat similar to learning to be a magician – if the execution is not sufficiently skilled, then your deceptions will fail, your audience will not be mystified, and basically your performance will be a failure.