Hypothesis: Music Was A Language System

20 July, 2019
Music was a language system. But now it isn't, because one critical component of the system has been disabled. As a language system, music was both limited and inefficient - each tune had one specific meaning, and there was no way to combine simpler meanings into more complex meanings. Eventually word-based language would evolve as a system that is much more efficient and productive. However, the evolution of word-based language was constrained by the prior existence of music, and initially it had to develop inside music, ie effectively as song lyrics. As a communication system, music has now been fully replaced by word-based language. Yet music still exists, because it interacts with our sense of "meaningfulness" – and sometimes this can help us to think about the deeper meanings of things.

Hypothesis 1: Music was a language system.

By a "language system", I mean a system for doing the following:

  1. Defining symbols
  2. Assigning meanings to those symbols
  3. Optionally, combining simpler meanings to create more complex meanings.

In modern humans, spoken language is the primary language system, which supports the definition and cultural transmission of specific spoken languages.

For a spoken language:

  1. The minimum units of meaning are words, or in some cases, parts of words. (The technical term for a minimal unit of meaning in a spoken language is "morpheme".)
  2. Meanings are assigned to words, or parts of words, by the community that speaks the language. These assigned meanings are learned by infants and children, by some process of observation and absorption.
  3. Words can be combined into sentences.

With the hypothesis that music was a language system, I propose a relationship between symbols and meaning that is somewhat different to what one might expect when comparing music and spoken language.

My hypothesis is, that for a musical language:

  1. The minimum units of meaning were melodies, ie the whole musical item represented just one single meaning.
  2. Meanings were assigned to melodies by the community that vocalised those melodies, and these assigned meanings were learned by infants and children, by some process of observation and absorption.
  3. There did not exist any way to combine simpler meanings into more complex meanings.

For a language utterance to be useful, it has to make some assertion about something. By themselves, words do not generally make assertions. Sentences make assertions.

If the only meaningful components of musical languages were whole melodies, then the meanings of melodies must have been assertions, and thus each melody was equivalent in function to what would be a fixed sentence in a spoken language.

The "productivity" of a such musical language system would necessarily be much less than that of modern spoken language. The number of different meanings that could be expressed would be exactly the same as the number of known melodies. Whereas, with the ability to combine words into sentences, a spoken language can be used to express an almost infinite number of different meanings.

A musical language would also be a very slow system for communicating information.

For example, if a society knows 1000 melodies, then the identity of one melody expresses only 10 bits of information. The absolute minimum length of a melody is perhaps 10 notes, and usually at least 20 notes or even more is required to define a strong melody. On top of this, it is usually expected that a melody will be repeated several times during a performance.

Whereas a word, or a part of a word, is often just one syllable. If we suppose the existence of 1000 monosyllabic words in a language, then a one syllable word would express the same 10 bits of information, in just the time it takes to say that one syllable. So the bandwidth of music, as a communication system, would be maybe 20 or 100 times lower than the bandwidth of spoken word-based language, ie it would take 20 to 100 times longer to transmit the same amount of information.

It's not surprising, that if such musical languages did exist, that they would eventually be superseded by much more efficient word-based spoken languages.

This leads me to my next major hypothesis.

Hypothesis 2: Music did not, and could not, evolve into a more efficient word-based language system.

Music could not evolve into a more efficient system, because the criterion for what constitutes a "strong" melody was defined in a manner that depended on the perception of complex relationships between all the different parts of that melody. This perception was structured in such a manner that it could never evolve continuously into the much simpler logic required to identify words.

The only way to increase the expressivity of a musical language would be to create an ever increasing number of distinct melodies. But at some point a society would reach a practical limit on how many distinct melodies could be "composed" and assigned meanings, while all the time successfully passing all those melodies and their meanings on to the next generation.

It follows from this hypothesis that spoken language did not evolve from music. Nor did music evolve from spoken language, and nor did music and spoken language evolve from some common ancestor. Spoken language had to evolve as a separate and new language system.

But, as we will see with my next hypothesis, it was not necessarily straightforward for a new more efficient language system to evolve alongside music.

Hypothesis 3: The evolution of a better word-based language system was constrained by the prior existence of music.

Given the prior existence of music as a language system, the possibility of evolution of any alternative language system was constrained in at least two ways:

  1. An alternative language system could not encode information using any characteristic of sound that was relevant to identifying melodies – because any such useage would have interfered with the operation and functionality of the existing musical language system.
  2. Because music strongly controlled the acquisition of musical language, any alternative language had to come into existence inside the music.

Additionally, any alternative language system could not encode information using characteristics of sound relevant to perceiving speaker identity. This constraint would also have applied to music as a language system – because any system of voice-based language has to allow speakers with different qualities of voice to speak the same messages.

Consonants and Vowels

In practice, we observe that most of the information encoded in word-based languages is encoded in choices of consonants and vowels, and at the same time, choice of consonant and vowel sounds is not relevant to the perception of the identity of a melody.

Even without words, singing a melody requires the use of consonantal sounds to define the rhythm, and vowel sounds to define the notes. But, the specific identify of those sounds is not very important, and whether I sing a song as "doo doo doo doo" or "la la la la", it's the same melody.

Given that information in word-based languages is encoded primarily as consonant and vowel choices, and that this encoding would not have conflicted with the encoding of information in a musical system that depended on the identification of distinct melodies, this satisfies the first constraint.

Motivation for Acquisition

The second constraint listed above relates to the acquisition of language. On the one hand the musical language system required some mechanism of motivating new language learners to pay attention to identifiable melodies and to be interested in determining what the meaning of those melodies might be. On the other hand, in principle, a new system of language development could evolve its own independent system of motivation to learn the meanings of symbols.

However, it is likely that a new system could evolve much more quickly if it gave itself a "free ride" on the pre-existing motivational system. This could happen if the word-based language system evolved initially as a system of words embedded in the melodies. In other words, the first words were sung, not spoken.

For modern humans, the acquisition of spoken language does not require that any words be set to music (at least not as far as anyone knows). This can be explained by supposing that, over time, word-based language has completely overtaken music-based language, because it is so much more expressive and efficient, and, in this process of taking over, the mechanism of motivating the acquisitions of meanings has ceased to be dependent on the presence of music.

And once word-based language had evolved ...

Regardless of the specific details of how music and word-based language initially co-existed, there must have come a time where the benefits of maintaining the existing music-based language no longer justified the cost, and at this point there was a selective evolutionary benefit to just "turn off" the musical language system. Possibly only a single mutation was required, because to break a complex system it is often sufficient to break it in just one place.

However, music still exists, as a thing, but not as a language system. So my theory still needs to explain how the music-as-language system was "turned off", and which part or parts of it were turned off, and which parts were not turned off, and what purpose, if any, those still active parts currently serve.

Hypothesis 4: Only one specific component of the musical language system has been disabled.

The musical language system had a number of components that supported its overall function.

In the process of the evolution of word-based language as a more effective and efficient replacement, one specific component of the musical language system was disabled. All the other components of the system remain essentially intact, and they continue to exist, because music continues to serve a secondary purpose.

The following list of distinct components gives a plausible account of how the musical language system could have functioned. At the same time, it corresponds to how modern humans currently respond to music, if we assume that the last component in the list was the one that was disabled:

  1. There is a criterion for musicality, which defines what constitutes a "strong" melody.
  2. The same criterion determines whether or not a melody has a strong identity, suitable for assigning a meaning to the identified melody.
  3. The perception of musicality also provides a certain degree of emotional quality, which somewhat constrains which possible meanings might be assigned to a given melody, but which, at the same time, does not completely determine that meaning. (So the sadness of a tune makes it more likely to be assigned a sad meaning, but there is still a choice to be made of which sad meaning to assign to the tune.)
  4. When listening to a new melody, a listener experiences a motivation to determine what the meaning of the melody is. (This motivation may be a more general motivation to assign a meaning to any "thing", not necessarily restricted to melodies – this relates to the hypothesis in the next section about the current function of music.)
  5. The determination of meaning would be guided by the circumstances in which the melody was uttered by other members of society.
  6. On each occasion that a listener hears a particular melody, and is provided some information which suggests or implies the culturally assigned meaning, that meaning would become more specifically defined within the brain of that listener.

As part of the process of abandoning the use of music as a language system, only the last item has been disabled. Modern human listeners fully respond to music, except, they have lost the ability to permanently fix and remember the culturally assigned meaning of any particular tune.

In other words, each time a listener listens to a melody:

But, no permanent decision is ever made inside the listener's mind about what the meaning of that melody actually is. Each time a listener listens to the same musical item, it is as if the process of learning the assigned meaning for that tune starts again from scratch.

Since this "abandonment" applies to everyone involved, over time the culturally assigned meanings of all melodies have been lost. The modern performers of music and their listeners do not have any shared specific idea about what the meaning of the music is (other than what the lyrics might say), even though, at the same time, the music always "feels" as if it is (or should be) very meaningful.

Hypothesis 5: The current purpose of music relates to its feeling of "meaningfulness"

As stated in item 4 of the list in the previous section, I hypothesize that when music was a language system, the acquisition of meaning was driven by the motivation that music created in the listener to assign meanings to things.

To support acquisition of meanings for a musical language, the "things" being assigned meanings were just the musical items themselves.

However, human thought sometimes goes beyond things that seem immediately and obviously relevant to the current circumstances. Sometimes, when there is no immediate requirement for decision or action, it can be useful to ponder the possibility that some things have meanings that are not so obvious.

My final hypothesis is that the current function of music, as a partially disabled language system that no longer functions as a language system, is to cause a listener to temporarily enter an altered state of mind. In this altered state of mind, the listener is motivated to think about things beyond the immediate "here and now", and he or she thinks about whether there are things and situations and phenomena that have meanings which are not obvious, but at the same time which are potentially important in the long run.


This then is my hypothesis about the evolution of music and language, and my answer to all those questions about what the relationship is (and was) between music and language.

In a followup post, I will give an account of the ideas and observations that led me to this particular hypothesis.