The Perception of Speech Tempo
This article presents a hypothesis about the perception of speech tempo.
But it is not a hypothesis about the conscious perception of speech tempo.
It is a hypothesis about the glial perception of speech tempo.
What is glial perception?
In our brains, neurons are the brain cells that do the thinking, and glial cells are the cells that support the neurons, helping them, in various ways, to do their thinking.
Neurons are the cells that do most of the information processing in the brain.
But glial cells do have to do some information processing of their own. In order to interact with neurons, the glial cells have to observe various aspects of the state and activity of the neurons that they are supporting, so that they can interact correctly and usefully with the neurons.
These observations of neural state and activity are a form of perception, and that is how we arrive at the concept of "glial perception".
So, why do glial cells need to perceive speech tempo, and how do they perceive speech tempo, and why do they need to have their own separate system of tempo perception (ie separate from the neural perception of tempo perception that underlies our conscious perception of tempo), and what is it about music that confuses this glial tempo perception system?
The following is a basic outline of the why and the how and the what:
- Human speech can occur at different speeds. Variation can range over as much as a factor of two, and variation over a range of 50% is quite common.
- When speech occurs at different speeds, the operating characteristics of the neural systems processing speech need to be adjusted accordingly.
- When speech occurs at different speeds, the operating characteristics of the neural systems downstream of processing speech need to be adjusted accordingly
- "Downstream" of speech perception includes any part of the brain that can represent the meanings of speech, which, given the wide range of meanings that speech can have, is a large portion of the brain.
- The required regulation of neural operating characteristics is carried out by glial cells.
- In order to carry out this regulation, the glial cells need to know what the speech tempo is.
- Glial cells do not have any access to information about speech tempo as represented by neural activity. There will be neurons in some cortical regions where the activity of those neurons is correlated with particular values of speech tempo. But, glial cells have no way of knowing which neuron represents which value of speech tempo, and most likely they have no way of even knowing which neurons are involved in representing speech tempo. (The exact location of neurons representing different perceptions and perceptual values can vary over time, so even based on "knowing" their location in the brain, glial cells can not reliably extract perceptual information from nearby neurons.)
- It follows that glial cells have to have their own separate system for estimating speech tempo.
- This system has to be based on direct observation of the patterns of activity of neurons nearby, without knowing what specific perceptions or perceptual values are represented by any specific neurons.
- A reasonable proxy for estimating the tempo of speech is to observe the rate at which neurons transition from inactive to full active, within all the regions likely to be involved in audio perception in general. Glial cells can make this observation directly without "knowing" what perceptual values any particular neuron represents.
- Music is a contrived stimulus which confounds this system of estimation.
- Music is contrived so that neurons in various cortical maps are divided into subsets of inactive and active neurons. That is, a large portion of the neurons in these regions never or rarely become active at all, and a second portion of neurons are constantly active, that it is they are provoked to a maximum level of activity by individual auditory events, and then they are constantly re-activated by new but very similar auditory events as a result of the patterns of repetition contained in the music.
- The system of tempo perception is measuring the rate at which neurons transition from inactive to fully active, on the assumption that a substantial portion of auditory events perceived will provoke such a transition, but the music contrives that most or all of the neurons in the affected regions remain either constantly inactive or constantly active, so such transitions occur much less often. This creates the illusion (ie a glial illusion), that the tempo of those auditory events is much lower than it actually is.
- As a result of this illusion, the glial cells incorrectly regulate the neural activity, both in the brain regions performing speech perception, and in all the "downstream" regions.
- This dysregulation of neural activity is what causes the altered state of mind associated with listening to music.
- The final part of the "downstream" of neural processing derived from speech perception is the processing that determines the emotional consequences of the perceived meanings of speech utterances. The neuronal dysregulation applies to this emotional processing, and that is what causes the emotional effects of listening to music.
A Numerical Example
(Added 11 Apr, 2024)
To clarify the distinction between neuronal perception of speech tempo and glial perception of speech tempo, I will give a simple example with numbers that nominally represent those perceptions.
Firstly, let us suppose that a person is speaking at 120bpm.
Then we would expect that:
- The neuronal perception of tempo is 120bpm.
- The glial perception of tempo is 120bpm.
Now suppose that that person speeds up and speaks at 150bpm.
The perceptions would then be:
- The neuronal perception of tempo is 150bpm.
- The glial perception of tempo is 150bpm.
In these cases the neuronal perception and glial perception are both correct. The neuronal perception corresponds to the listener's conscious perception of tempo. The listener is not consciously aware of the glial perception, but when the glial perception of tempo increases by 25%, this coordinates an adjustment to the operating characteristics of all the neurons processing information downstream of the perceived vocal audio to match the increased speed at which the audio information is being received by the listener's brain.
Next, let suppose that powerful music is being performed at 120bpm.
In this case we find a difference, but only in the glial perception. That is:
- The neuronal perception of tempo is 120bpm.
- The glial perception of tempo is 24bpm.
The neuronal perception remains correct, but the glial perception now underestimates the tempo by a factor of 5.
This underestimation then causes an intensifications of emotional response by a factor that is a function of the tempo underestimation factor.
It might for example be the same factor, ie the function is the identity function and the emotion is intensified by a factor of 5. (It could be some other function, for example a different power law. However there is at the moment no way to directly measure either of these factors, so to keep it simple, for the sake of clarification, I will assume a simple identity function.)
What happens if the music is now performed at 150bpm?
If it's the same music, and the change in tempo does not adversely affect the musical strength, then the underestimation factor will be similar, ie:
- Neuronal perception of tempo is 150bpm.
- The glial perception of tempo is 30bpm.
And the emotional intensifaction factor will be the same, ie 5.