Music Was For Sharing Thoughts, Before Language

9 September, 2018

Music originally existed as a form of communication for sharing thoughts, without providing any means to communicate the content of those thoughts – signalling to the listener only a motivation to determine what the speaker/performer was thinking.
Music then later evolved to not be a form of communication, persisting only as a means for temporarily altering the listener's state of mind and motivation.

by Philip Dorrell

Thought Sharing, and "Theory of Mind"

When we talk about "sharing thoughts", we are almost always referring to the act of sharing thoughts by speaking about them.

But what about when there was no spoken language, and no ability to speak?

Prior to the existence of spoken language, our pre-human ancestors may have had a "theory of mind" – the ability and motivation to determine, where possible, what other individuals are thinking.

Usually we think of "theory of mind" as being something to the benefit to the individual with the theory, and not particularly of benefit to the individual whose "mind" is being theorised about.

Yet we live as a species where we are constantly sharing the contents of our minds with other individuals of our own species, by means of talking. Speaking, and listening, are activities that are typically beneficial to both parties involved.

So we can imagine a plausible scenario that occurs after the development of "theory of mind", yet before the development of symbolic communication, a scenario where individuals can decide that the "sharing of thoughts" from one individual to another would be of mutual benefit to both individuals. And we could further imagine that this could happen even though there did not yet exist any easy way to communicate the content of those thoughts.

I therefore present a hypothesis:

There was a stage in human evolution where one individual could signal to a second individual, by some means, that the second individual should be interested to determine the content of some thought in the mind of the first individual, to the benefit of both invididuals.

This signal was music, or rather, the predecessor of music as we know it.

We can call it "communicative music", because it was a form of communication, from one individual to another (unlike music as it exists now, which is not a form of communication).

Communicative music would have been a particular type of vocalisation (and as I will explain soon, one that would not sound particularly "musical" to modern ears).

Unlike spoken language, this signal did not contain any specific information about the content of the communicating individual's thoughts, although it probably did have an optional emotional quality, which would have provided information about the type of emotion associated with this thought.

(This optional emotional quality continues to exist as an aspect of modern non-communicative music, ie where music can have an identifiable emotional quality in some cases, yet in other cases music can be "strong" but not correspond reliably to any particular type of emotion.)

The result of this musical communication would have been that the listener would be motivated to:

pay attention to what emotional state the communicating individual was in, and to be interested in what that individual was thinking about that would result in that emotion, and,
to consider the possible non-obvious significance of something perceived in the immediate environment, that both individuals are likely to be aware of – which is the thing that presumably the communicating individual is thinking about.

We can describe the meaning of this musical signal as a communication from one individual to another, with a meaning as if the communicating individual was saying the following to the listener:

I have just had a significant thought about something, and I want to share my thought with you.

Unfortunately, spoken language has not yet been invented, so I cannot directly communicate the specific content of this thought.

So, pay attention to any emotion I appear to be expressing, and think hard about why I might be expressing that emotion.

Also, pay attention to things in our immediate shared environment, to determine if there might be something you are aware of which might have some non-obvious significance – because that could be the thing that I have just had a significant thought about.

Communicative and Non-Communicative

How and why was communicative music a useful thing? How did it relate to spoken language? And why did it become non-communicative?

Communicative music would have been applicable in situations where the communicating individual would want the listening individual to make an effort to determine what the communicating individual was thinking about.

This form of communication would only have been truly useful prior to the development of any form of symbolic communication, such as spoken language. With spoken language, there is no requirement for the listener to make an effort to determine what the speaker is thinking, because the speaker can simply tell the listener what the contents of the speaker's thoughts are, including, if necessary, why those thoughts are significant, and what emotional consequences, if any, they might have.

This musical form of communication may have itself enabled or at least facilitated the development of symbolic communication, given that it involved changing the state of mind of the listener to think about possible meanings of things.

But in any case, once spoken language came into existence, communicative music would have become obsolete as a form of communication, and one might expect that the musical form of communication would simply have faded away.

However, in addition to its main function, communicative music had an unintended side-effect, which was that it put the listener into an altered state of mind, one where they were motivated to make an extra effort to think about the non-obvious significance of things. On some occasions this altered state of mind might have led the listener to have useful thoughts that they would not otherwise have had, thoughts not necessarily related to those that the communicating indidivual was attempting to communicate in the first place.

Thus, instead of being rendered completely obsolete, music evolved to have an alternative function – as a means of achieving a temporarily altered state of mind that facilitates and motivates certain types of thinking.

Because it involved only the listener's private thought processes, this evolved form of music was non-communicative.

The Evolution of Music from Communicative to Non-Communicative

In the course of evolving from communicative to non-communicative, music probably underwent two major changes:

It changed from being a form of one-to-one communication to not being a form of one-to-one communication
It became less about considering the significance of things perceived in the immediate environment, and more about the considering the significance of things being privately thought about by the listener.

There are a few properties of music which stand in contrast to those of language, and which highlight that music is not a form of one-to-one communication. These include:

The perception of musicality is very quality-driven. For the most part, people only listen to the best possible music. Whereas with spoken language, most speech listened to is conversational, and the quality is for the most part "good enough".
It is easier to achieve a higher quality of music if it is performed by a group (because a group of people singing will be more in-tune than a typical individual due to an averaging effect – and also the "chorus effect" makes group singing sound better.)
For the most part, individual singers only produce a satisfactory result if they are very practised.
Also, it is easier to achieve a quality result if musical instruments are involved.

This quality-driven aspect of the listener's perception of music results in the production of music becoming both difficult and competitive.

This competitive aspect of music is taken to an extreme in modern Western economies, where most of the music that most people listen to is composed, performed and produced by a very skilled and practised elite group of composers, performers and producers.

The end result of this competitiveness is that it is almost impossible for most individuals to casually produce music that other listeners would find it worth listening to. Whereas for speech, casual production by "average" performers is the norm.

A second result of the quality-driven nature of musical perception is that it limits the amount of time that any individual can spend listening to "high-quality" music. Given that the benefit of music is to produce a temporarily altered state of mind, it is desirable for this same state of mind not to be achieved all the time.

(Of course modern technology somewhat undermines this limitation, because it is not that hard to listen to music you enjoy every single waking hour of the day if you want to. However, even with modern technology, and the whole economic infrastructure of the world music industry, any person listening to music all the time will become bored with their favourite tunes at a rate faster than the music industry can create new music that's good enough to suit their personal taste.)

The Quality of Music: Music as a Super-Stimulus for itself

My first attempt at a theory of music was to suppose that music was a super-stimulus for something.

I later moved away from this, but then, in developing this current theory of the evolution of music, and the evolution of the difficulty of music, I have come back to the idea of music being a super-stimulus.

Also, I have arrived at the idea of music being a super-stimulus for itself, which was not the case for my original theory, where I supposed music to be a super-stimulus for something else (in particular a hypothesized aspect of speech perception).

This makes my current theory more similar to the theory of Sexual Selection

Comparison to Sexual Selection

In the evolutionary theory of sexual selection, as developed originally by Charles Darwin, sexual characteristics evolve as super-stimuli for themselves, and at the same time the individuals responding to those characteristics become more and more "fussy" about what is considered good enough.

This can be understood as the evolution of perceived "quality", where the measure of quality is the degree of perfection of the sexual characteristic.

The nature of the sexual characteristics is somewhat arbitrary, and the theory of sexual selection does not make any strong predictions about what the sexual characteristics will be. In principle it might be possible to make such predictions, but this would require a detailed understanding of the full mechanics of how the bodies of the individuals of a species develop, and how that is genetically determined, as well as a detailed understanding of how the perception of sexual characteristics develops, and how that is genetically determined.

Suffice to say that Charles Darwin did not develop the theory of sexual selection and then predict the existence of peacocks – rather he already knew about peacocks, and he developed the theory of sexual selection to explain why peacocks evolved.

According to my current hypothesis, something similar has happened with the evolution of music. Once music perception started to evolve to be quality-driven, the specific rules for "quality" evolved, and these rules are the "rules" that effectively determine all of the observed properties of music. We can predict that there will be such rules, just like there are rules about which colour different parts of a peacock should be. We can consider that the theory "explains" the existence of rules that define musicality. But we can't predict specifically what the rules should be.

(Additionally we observe that music is somewhat more complex and varied than the appearance of peacocks. We don't actually know what all the rules of music actually are, despite our familiarity with thousands upon thousands of specific examples of "high-quality" music. We can observe that the composition of new "quality" hit music remains a very ad-hoc procedure carried out by individuals who are unable to explain to anyone else in formal and verifiable terms how they actually did what they did when they composed the hit music in question. Neither scientists nor musicians have yet discovered a complete "theory of music" that strongly predicts what will or won't be "musical".)

I hypothesize that the "quality" of music is the result of the evolution of music away from being a form of communication. The requirement for quality makes it increasingly difficult for any individual to produce "music" without substantial effort, practice and co-ordination with other music-makers, and as a result it is no longer possible for music to be something that one individual casually "does" to another individual.

It follows that the original communicative form of music did not have any "quality" aspect, and therefore that if we heard this original form of music, we would not think there was anything remarkable about it, even though the functional effect of said music was to somewhat alter the state of mind and motivation of our ancestral listeners that it was directed towards. (In other words, for us, this "music" would be so "weak", and lacking in quality, that it would not make us feel anything.)

Perception versus Thought

A second change that has happened to music, as it evolved from a form of one-to-one communication to a non-communicative system for reaching a temporarily altered state of mind, is a change to which mental processes music has the strongest effect on.

As a form of communication, it made most sense for music to act on the perception of real things in the immediate environment shared by communicator and listener, and for music to motivate the listener to be interested in the thoughts of the communicator, where both of them are engaged in a social interaction.

I propose, that in the transition away from music as a form of communication, the human brain has evolved specific inhibitions which prevent music from acting on those mental processes that it previously acted on.

In particular, there are certain differences between perceived reality and private thought processes which enable the brain to distinguish between the two. The processes of inhibition take account of these differences, and act only to inhibit the effect of music when the listener is perceiving immediate reality, and not to inhibit the effect of music when the listener is thinking private thoughts.

It should be noted that thinking involves neural activity in the same parts of the brain as in the perception of corresponding aspects of reality. For example, thinking about the colour blue activates the same neurons as actually seeing something blue. Only weaker.

So it is not possible to distinguish between the perception of reality and private thought processes just by observing which neurons are active, because the same neurons are involved in each case.

Rather, the distinction must be made based on some properties of the perception of reality which are not properties of internal thought processes.

The best clue as to how this distinction might be made comes from considering music videos.

Music Video Techniques

Music videos can be thought of as an intermediate state between perceived reality and private thoughts.

In some respects video is closer to the actual reality of being there, than it is to the experience of our own internal thoughts, for example the visual experience is certainly stronger than that of the images in our private thoughts.

However there are certain stereotypical filming and editing techniques used in music videos, which cause the video to deviate in various ways from a simulation of realistic visual perception.

I suggest that these techniques represent the suppression of the those factors which inhibit musical effect, where the inhibitory factors correspond to properties which are properties of the perception of reality and which are not properties of internal thought processes.

Identification of Three Inhibitory Factors

For some of these techniques we can plausibly identify the corresponding inhibitory factors, and for each inhibitory factor identify which video techniques correspond to the removal or suppression of that factor:

Social Interaction This is one inhibitory factor where we get elimination "for free". When we watch a video, we can stare at an actor's face, and think about any possible emotions they are feeling, and consider what they might be thinking about, all without there being any possibility of social interaction. (Simply because it's a recording, and the actors and the audience are separated in both space and time.)
Continuity of Reality A very common technique in music videos is to fragment the shots in the video, and this eliminates continuity that exists in our normal perception of reality.
Centering and focussing the subject in the field of view. When we look at someone or something of interest in real life, we tend to both focus on that person or thing, and we hold them in the center of our view (because, among other things, the central part of our retinas has the greatest density of receptors). A very common technique in music videos is to go in and out of focus, and for the person or thing of interest to drift in and out and around the field of view.

To put it another way, I am asserting that these properties of perceived reality inhibit the effect of music on a listener's thoughts:

Social interaction
Continuity of perceived reality
Centering and focussing the subject in the field of view

And then I am asserting that the application of filming and editing techniques which remove these properties from music videos have the effect of dis-inhibiting the effect of the music on the listener's response to the content of the video.

Other Filming and Editing Techniques

There are also other techniques that are consistently used in music videos, which plausibly involve the suppression of some property particular to the perception of reality as opposed to internal thoughts, however it is not so obvious what the property in question would be in each case.

Possibly the most consistently used editing technique in music videos is that of slow motion.

Used as a technique, slow motion only does one thing to the presented content: it makes it go slower.

Sometimes, slow motion is combined with intermittent fast motion, suggesting that it is not just a matter of "slower is better".

One can develop various hypotheses as to which specific property of perceived reality is removed by using slow motion (and here I list just two possibilities):

The rate of change of motion is slowed down. The property of normal perceived reality that is removed is the "normal" rate of change of motion that is normally perceived (ie when watching other people).
The perception of gravity at its normal value (of 9.8 m/s²) is replaced by a perception of gravity at some other value.

(Experimentation would be required to determine which if either of these two hypotheses is correct. This might involve, for example, filming music videos in low-gravity conditions, to tease apart the effect of changing gravitational acceleration and the effects of changing other aspects of perceived motion.)

There are various other types of visual degradation which are used in music videos, such as transient multiplicity of images, and transient multiplicity of different colourisations, and other forms of transient degration. It is plausible that these also represent the elimination of aspects of perceived reality that act as inhibitory factors.

It is worth noting that there are some types of perception degradation which are typically not used in music videos, and even if they have occasionally been used, seem "gimmicky", and not very emotionally effective:

Human identity is usually well-preserved in music videos.
Distortions of human shape are typically avoided.
Degradation of visual appearance to the point of being cartoon-like does not work so well.
Complete elimination of motion is also ineffective – just presenting a photo or a slideshow does not make a great music video. (Also, just providing a blank screen does not work, although transient blanking out is sometimes an effective technique.)