Hypothesis: Proto-Music Was a Language That Asserted Shared Emotions

24 March, 2021

Music communicates the emotions of the communicator, and it asserts that the listeners should experience those emotions. But those emotions mostly relate to hypothetical or fictional scenarios.
Proto-music was a language which communicated emotions in the same way, but about real things.

by Philip Dorrell

How Music "Communicates" Emotions

It is often said that music "communicates emotions".

But what does this actually mean?

Consider this scenario where I might use music to "communicate" an emotion of happiness:

I am a person who experiences a particular emotion, ie happiness.
I decide that I want to communicate my happiness to other people.
I perform a happy item of music.
Now those people who were listening to my performance know that I am happy.
Also, as a result of listening to the happy music, those people feel happy.

This scenario raises a question, because the last two items specify two slightly different forms of communication, i.e:

I am communicating to my listeners that I am happy.
I am communicating to my listeners that they should be happy.

Does music communicate the first, or the second, or both?

If we conflate the first and second items, we get:

I am communicating to my listeners that we should all be happy.

From this one can formulate a hypothesis:

Music exists to communicate and assert shared emotions.

Caveats

However, this hypothesis comes with caveats.

The first is the observation that we do not routinely inject music into conversation for the purpose of communicating or asserting any type of emotion. In fact, as I observe in Why Don't We Sing In Conversation?, we do not have any tendency to inject music into conversation at all, other than for the purpose of actually talking about music.

Secondly, music preferentially communicates emotions relating to hypothetical or fictional scenarios, ie not relating to what is actually happening, or what has happened, or what is going to happen.

The Dual-Aspect Hypothesis and Proto-Music

In my previous article The Dual Aspect Hypothesis: Emotional Quality and Intensity, I developed the hypothesis that music as we know it has two separate aspects, which evolved in two separate stages:

Emotional Quality
Intensity of Emotional Quality

Prior to the second stage, music existed in an non-intense form (more precisely, the features of modern music which currently intensify the emotional qualities of music did not then exist or have any effect). I also hypothesised that this earliest form of music, or proto-music, actually did support a form of communication.

("Proto-music" is a term that is often used to refer to some supposed earlier form of human music. Of course different authors using this term all have their own different ideas about what proto-music was, and how it differed from modern music. Here I use the term to refer to my own ideas about what was the ancestor of modern music.)

If this proto-music served as a form of communication, then the two caveats I gave above likely did not apply.

In other words:

Proto-music was used during normal conversation, to communicate with other individuals.
Proto-music was used to communicate about the emotional implications of real situations (and not hypothetical or fictional scenarios).

Once we allow that proto-music was used during normal conversation, we can consider the possibility that this proto-music existed as a form of communication prior to the existence of word-based spoken language.

Indeed I have already discussed this possibility in my earlier article Music: A Vestigial Innate Language Extension.

I have previously discussed possible meanings of the "emotions" conveyed by music in Music Is About Possible Goal Achievement.

I have also discussed the general issue of what "emotions" actually are in What Are Emotions?, where I conclude that emotions are abstract perceptions, in effect abstract summaries of the current situation of the individual feeling the emotions, and that musical "emotions" are also abstract perceptions, but they are a distinct set of abstractions, albeit with some overlap (where that overlap is sufficient that we readily identify music as "happy" or "sad" etc).

With the hypothesis that proto-music existed to communicate and/or assert shared emotions, these ideas can be taken further.

Normal Emotions vs "Musical" Emotions

I have observed that "musical" emotions are somewhat different from normal emotions.

We can look at what the purpose of normal emotions is.

To a first approximation, emotions serve a function that is internal to the individual experiencing them.

An emotion is like a top-level summary of some aspect of your current situation, and it informs your decisions about what you should do next (which might be an actual physical action, or it might be just thinking about something).

Normal emotions can be communicated to other people in various ways, for example you can smile when you are happy, or you can talk, and your happiness will alter your intonation, or you can communicate directly in words, ie "I am happy".

However communication of your emotional state to others is secondary to the primary internal purpose of emotions, and communication is not required. For example, you can choose not to smile, and you can consciously control your intonation so that your speech does not sound happy, and you can choose not to tell anyone that you are currently happy.

Also, when emotions are communicated, it is a communication specifically about what the emotion the communicating individual is experiencing, and it does not necessarily follow that anyone learning about that individual's emotional state should also experience the same emotion – although a certain amount of emotional "contagion" can happen, especially if the audience is close to the person communicating their emotions.

The Purpose of Proto-Musical Emotions

I am going to make two assumptions about the purpose of proto-music, as a means of communicating emotions:

The primary purpose of musical emotion was to be communicated to others. Indeed musical emotions are only ever felt when one is listening to the music, and are never invoked by any other means.
The communication of musical emotion always carried an implicit assertion that the audience should feel that same emotion.

These assumptions only make sense if we assume that proto-music was used by human ancestors who had a high level of mutual interest and empathy. In particular, the "speakers" of proto-music must have had a higher level of mutual interest and empathy than modern humans.

Modern humans live in societies and groups within society where there is substantial cooperation. However, the tendency to cooperate is always constrained by differing and conflicting interests.

Modern humans can cooperate with some degree of success with other individuals in their group, even though all parties involved have conflicting interests. Much of this cooperation involves the use of word-based spoken language to negotiate said cooperation.

Indeed, if we assume that the speakers of proto-music did not (yet) have word-based spoken language, then we might suppose that the ability of modern humans to simultaneously cooperate and be in conflict with each other is something that has evolved as a result of the development of the more sophisticated word-based language.

Side Note: Music and "Group Cohesion"

One group of hypotheses about the function of music is that music supports "group cohesion" in human societies. My hypothesis is more-or-less the other way round, ie I propose that music served a function in pre-human societies, which only made sense because those societies had a level of "group cohesion" higher than what modern human societies have. So proto-music did not create group cohesion, rather proto-music depended on group cohesion. Indeed we modern humans do "feel" membership of the group in musical situations, but my claim is that this is essentially a fantasy, one that fades as soon as the music stops.

An Example of Proto-Musical Communication

So let us consider a scenario involving ancestral humans communicating via proto-music.

For example, one individual in a group "speaks" in proto-music to communicate a musical emotion, which might, for example, say something like: "Something exciting has happened, not right where we are now, and we should respond to this purposefully and intentionally together".

How then, can other individuals in the group respond to this communication?

They don't have word-based language, so it is not possible to ask for the specific details of the alleged exciting thing or situation.
The listeners' response is therefore all or nothing – either they can accept the emotion communicated by the speaker, or they can just ignore the speaker entirely. There is no opportunity to interrogate the speaker about the details in order to determine whether or not the alleged excitement is justified, or whether the listener would be as excited as the speaker apparently is.

The communication does not contain any details of where the exciting thing is, other than the assertion that it is not "here".

So how do the listeners know where to go, should they be inclined to follow the intention to deal with the exciting thing?

The communicating individual might point in the general direction of the exciting thing.
Or, the communicating individual might just start moving towards the exciting thing, assuming that the listeners have taken on the shared emotion of excitement and active intention and are therefore keen to join in on whatever is required to deal with that situation when they all get there.

Confrontational Scavenging

In the scenario I just described, there was communication about something exciting, and a possible willingness of the listeners to join in on the excitement, even though they didn't yet know what (or where) the exciting thing was.

So, what could the excitement have been about?

Derek Bickerton was a linguist who proposed a hypothesis about confrontational scavenging as a basic trigger for the evolution of modern human language.

Confrontational scavenging refers to situations where some large animal has died, from being hunted by a predator, or maybe some other reason, and there are already other large dangerous carnivores at the carcass, but if we go as a group we can scare away the other scavengers and take a good proportion of the available meat for ourselves. The alternative would be to wait for the other large scavengers to finish eating all the meat, and arrive when only the vultures are left, possibly armed with stone tools to break open the bones and eat the marrow (because that's all that's left after the lions and hyenas have had their fill).

The confrontational style of scavenging is more dangerous, but it gets you more meat. It requires that the individual who first discovers the carcass travel back to where other members of that individual's group are and then communicate to them about something that exists somewhere else, ie the carcass. The communicator must communicate this in a manner than motivates some or all of the listeners to quickly join in an effort to go to the carcass, fend off the other scavengers, and acquire or eat a decent portion of the available meat.

Bickerton proposed that this type of scavenging is a unique niche that would have created a strong evolutionary pressure to evolve something similar to modern human spoken language, a language that included the ability to communicate important information about things outside of the "here and now".

An Alternative Version of the Confrontational Scavenging Hypothesis

However, based on my analysis given here, an alternative hypothesis suggests itself – that the initial form of communication which evolved to support confrontational scavenging was proto-music, as I have described here – and word-based spoken language came later (possibly evolving initially embedded in proto-music, a scenario I explain in some detail in Music Is A Vestigial Trait).

In my version of the confrontational scavenging hypothesis, proto-music came into existence as the simplest possible form of communication that could express the required information. It was simple because it was constructed from a small vocabulary expressing a small finite number of abstract concepts – but it was enough to facilitate real-time cooperation between members of a small group to exploit major but transient opportunities located somewhere other than the location of the communicating individual.

As I have explained in Music Is A Vestigial Trait, the proto-musical language was simple in the sense of having a very small vocabulary, but it was probably also inefficient in how it represented that vocabulary. A modern musical item requires quite a few syllables to represent just one emotional quality, given that the emotional quality of a song is not usually fully evident until it has continued for at least a few bars and at least one chord change. Very likely proto-music required the same number of syllables to represent the same emotional qualities that modern music evokes.

This can be compared to word-based language, where the words or parts of words that represent individual components of meaning usually consist of no more than two syllables.

The proto-musical language represented information in a manner where meaning was represented by different relationships between the different syllables contained within the proto-musical phrases, and this type of language could never directly evolve into the more compact and information-dense word-based language, where meaning is represented by sequences of meaningful components of one or two syllables each. So word-based language had to evolve separately both alongside and possibly embedded within the proto-musical language (in effect the first words were like the lyrics of the "songs" that constituted the content of the proto-musical language).

As a result of the evolution of the fancier and more functional word-based spoken language, the original proto-musical language became both irrelevant and obsolete, and it disappeared from normal conversation, in the end continuing to exist only as a vestigial trait with a different purpose from its original purpose.

And that's how we got to where we are today.

Appendix: Similarities and Differences between Music and Proto-Music

The theory of music and proto-music I have given here contains a number of hypotheses, some of which apply to music as we know it, and some of which apply to a hypothetical ancestor of music, ie proto-music, and some of which apply to both those things.

It is easy to get confused about which hypothesis applies to which of those two things.

So I think it will be useful to give a summary of the hypotheses about proto-music and music, specifying both the similarities and the differences.

I have also developed some hypotheses about how proto-music evolved into music, and how that evolution co-occurred with the evolution of modern word-based spoken language.

I will start with two separate states, that of the initial form of proto-music, as compared to modern human music, and then after that I will list hypotheses about how the initial form of proto-music transitioned into modern music.

In the following, I will use "music" to refer to music as performed and consumed by modern humans, and "proto-music" to refer to the hypothesised ancestor of music, as "spoken" by some set of human ancestors at some time in the prehistoric past.

So:

Proto-music was a form of language that expressed emotional qualities, as communicated by a "speaker" to one or more listeners. The emotional qualities expressed by proto-music included an implicit assertion that the speaker and the listener or listeners should all feel the same expressed emotion.
The idea of expressing shared emotion made sense to the speakers of proto-music, because those human ancestors lived a lifestyle that required a high level of group cohesion and willingness to rapidly cooperate to exploit transient and possibly dangerous opportunities, eg confrontational scavenging.
Music expresses a similar set of possible emotional qualities as what proto-music expressed. Music similarly includes an expression of an emotional quality shared by both performer and listener or listeners.
Music can create a feeling of intense group cohesion and shared emotion, but this is mostly a fantasy that fades away once the music stops.
Proto-music expressed emotional qualities relating to real situations.
Music preferentially expresses emotional qualities relating to hypothetical or fictional situations. (The speakers of proto-music likely had no concept of "fiction", but may or may not have had the ability to think hypothetically about possible situations.)
Music is a superstimulus for an aspect of speech perception, and the superstimulus aspects of music act to intensify the emotional qualities of music. Indeed, modern listeners are not even aware of any emotional qualities if the music does not include superstimulus aspects. These superstimulus aspects include regular beat and use of pitch scales.
Proto-music did not include these superstimulus features. It could not, because speech perception, in the sense of the perception of word-based speech, did not exist, because word-based spoken language did not yet exist.
The superstimulus aspect of music has the consequence that the effective performance of music requires specially learned and practised skills, and a small number of performers end up being much better at this than most people. (An alternative is that a good musical result can be achieved by a large group of moderately competent performers such as a choir – but the requirement for group performance cannot apply to an actual communicative language because it conflicts with the normal requirement of communication where almost always a single individual is communicating to one or more listeners.)
Proto-music was a language in the same sense as modern spoken language, where everyone could communicate, and everyone could listen. Most individuals could use proto-music with a sufficient level of competence, and the "performance" of proto-music was not competitive in the same sense that the performance of music is competitive.
When listeners of proto-music understood the expressed emotional qualities, there was little or no supplemental information provided by the speaker about the specific details of what the emotional quality referred to. The listeners had to work it out for themselves.
With music, specifics are easily provided as the lyrics of a song, or, the music may accompany a detailed portrayal of fictional events that the emotion quality of the music applies to.

Transition, from Proto-Music to Music

The basic concept of proto-music is that it was something which evolved into music as we know it. Additionally, modern grammatical word-based language evolved, starting from a state where it did not exist at all.

These changes can be summarised as:

No word-based language existed => word-based language, sometimes embedded in the music.
Proto-music applied to real situations => music applies mostly to hypothetical or fictional situations.
Proto-music had "melody" and "rhythm", but neither of those were regular or discrete => music has regular beat and pitch scales.
Proto-music was a language requiring normal competence => music tends to become competitive where only the best is good enough.
High level of group cohesion => not-so-high level of group cohesion – the development of more sophisticated word-based language may have been partly responsible for this reduction, as it facilitated more nuanced combinations of cooperation and conflict between individuals of a group.
Word-based language replaced proto-music as a form of communication, because the fully evolved form of word-based language was both more concise and more flexible in what it could express.

Some of these changes may have happened after others, and some might have occurred in tandem with each other. (At this point it would be over-ambitious of me to claim a full coherent model of which change occurred when, so for the moment I will leave the reader with this list.)