Is There Musical Syntax Like Language Syntax?

2 November, 2006

Is there a syntax of music which is analogous to the syntax of language? Some recently published scientific studies explore this analogy, and identify the "rules", "principles" and "regularities" of music with "syntax". However, musical items may be constrained by certain rules because they are local solutions to an optimisation problem. I suggest an experimental protocol which tests this hypothesis by looking for dissociation between musical expectation and musical pleasure.

by Philip Dorrell

The Science of Musical "Syntax"

Does it make sense to say that music has syntax? A number of scientific papers published in recent years suggest that the answer to this question is "yes":

Musical syntax is processed in Broca's area: an MEG study (Burkhard Maess et al 2001)
Language, music, syntax and the brain (Aniruddh Patel 2003)
Neural substrates of processing syntax and semantics in music (Stefan Koelsch 2005)
The Neural Locus of Temporal Structure and Expectancies in Music (Daniel Levitin and Vinod Menon 2005)

What the authors of these papers attempt to show is that there is an analogy between the syntax of language and the syntax of music. However, they use very general notions of syntax, and also, I might say, rather vague ones. Maess et al state that "music, like language, has a syntax: both have a structure based on complex rules". Patel gives a possibly uncertain definition (accompanied by a reference to Foundations of Language by R Jackendoff):

Syntax may be defined as a set of principles governing the combination of discrete structural elements (such as words or musical tones) into sequences.

Koelsch tells us that:

All types of music are guided by certain regularities.

and later on introduces the word "syntactic" into the phrase "syntactic irregularities" (thus implying that "syntax" = "regularities"). And Levitin and Menon tell us, after a discussion of structure, expectancies and hierarchy, that:

Syntax in both language and music refers not just to these hierarchies, but to rules and conventions that contribute to structure in both domains.

These definitions are annoyingly vague and somewhat evasive, in that the authors discuss something that sounds a bit like a definition of what syntax is, and then they start discussing "syntax" in the context of music, as though we (i.e. the authors and the readers) now know exactly what that word means.

Differences Between Language Syntax and Musical "Syntax"

Even without a specific description of what musical "syntax" is, it is apparent that language syntax has certain properties which do not apply to music, so whatever musical syntax might be, it is qualitatively different from language syntax.

The most important feature of the syntaxes of natural languages is that they are approximately equivalent to context-free grammars. Context-free grammars are very commonly used to define programming languages and other formal languages, and they have the property of dividing a sequence of lexical elements into a hierarchy of components where the relationship between parent and child is independent of the context of the parent (thus the "context-free" aspect).

I use the word "approximately", because a lot has been made of the observation that human languages are not defined by context-free grammars. But actually they come fairly close, close enough to suggest possible parallels between how the brain processes language syntax and how programming languages are parsed by compiler software.

Important features of context-free grammars are:

Parsing language content results in the definition of a tree structure, which can be used to regenerate the linear sequence of tokens (i.e. words) in the parsed content.
Parsing assigns both individual words and sub-trees of the parse tree to grammatical categories.
There are no dependencies between "sibling" sub-trees, other than the required relationship between the grammatical categories that each sub-tree belongs to.
Any sub-tree can be replaced by another sub-tree belonging to the same grammatical category, and the replacement sub-tree may be much larger or smaller than the sub-tree it replaces.

These properties hold substantially for the grammars of human languages, and one can read books on language syntax which contain diagrams showing sentences parsed into parse-trees, where each word and each sub-tree is assigned to a grammatical category.

It must be pointed out that there are various deviations of natural language from context-free-ness, and as I understand it is non-trivial to write a parser for a natural language, for instance English, which successfully duplicates the grammatical intuitions of a native speaker for all possible sentences. (There may also be languages which have less dependence on word order, in which case the resemblance to context-free grammars may be weaker.)

However, with respect to musical syntax, we can see that these properties do not hold even approximately in the case of music:

Although music can be parsed into tree-like structures, this parsing does not depend on assignment of tokens (i.e. notes) to grammatical categories, rather it depends on recognition of parallel structure in musical phrases deemed to belong to sibling sub-trees.
There is no notion of grammatical categories which are equally applicable to sub-trees of different sizes and to individual notes.
There can be strong dependencies between sibling sub-trees, often to the extent of an extended musical phrase being an exact repetition of a previous musical phrase. There can also be partial repetitions which repeat some aspects such as rhythm and melodic contour, but not other aspects such as absolute pitch and harmonic relationships.
There is little "free" substitution possible in music. Indeed it would appear that many musical items are constrained by a set of rules so tight that the only music which satisfies those rules is the same musical item.

Syntax and Expectation

The papers that I referenced above do not discuss these differences between languge and music. And they avoid dealing with the specifics of syntax by reducing the study of syntax processing to the study of expectations (except for the case of Levitin and Menon, who study the perception of structure, without depending on any specifics of how such structure is perceived). In other words, syntax is something that tells you what to expect or what not to expect, so syntax processing in the brain can be studied for language and music by studying the reactions of subjects to the expected and the unexpected. In language the unexpected is the appearance of a word in a sentence that causes the sentence to be unparseable; in music the unexpected is a "wrong" note, or a "wrong" chord.

Thus, Maess et al perform an experiment "designed to localize the neural substrates that process music-syntactic incongruities", Patel refers to his own earlier work studying P600 event-related potentials when "musicians listened to sentences and musical chord sequences with varying levels of syntactic incongruity" and Koelsch discusses studies with "chord sequence paradigms in which chords presented at particular positions within harmonic sequences are structurally more or less (ir)regular". (Levitin and Menon's fMRI study investigates the perception of musical structure by studying subjects' perception of music against perception of the same music which has been chopped into segments with the segments then randomly permuted and joined together using a cross-fade.)

An Alternative Theory of Musical "Syntax"

According to my theory of music, musicality is defined by the occurrence of constant activity patterns in cortical maps that process the melody and rhythm of speech. Musicality is a function of how constant the activity patterns are, how large a brain area they occur over and how many borders there are between active and inactive regions. "Strong" music is music which maximises this perceived musicality.

A maximisation problem naturally leads to constraints, because a locally optimal solution is situated on the highest peak of a "hill" in the problem "space". It is constrained to remain on the peak – any movement away from the peak necessarily goes "downhill".

It follows from this theory that the observed "rules" of music are not themselves the criterion of musicality, rather they are a consequence of the constraint that music should occupy a local maximum in music "space".

So why would listeners develop expectations about the patterns of music? Musical culture can be regarded as the discovery of a set of "hilltops" within the musical "landscape", and the maxima within a particular region of this landscape may share certain constraints on how the corresponding musical items are defined. On exposure to music of a culture which inhabits one particular region of the musical landscape, the listener will gradually learn the relationship between perceived musicality and adherence to the rules defined by that musical culture. As a result, each listener develops their own internal musical theory. This theory predicts the musicality of music according to its adherence to certain rules, and it also predicts what the next part of a musical item is going to be, on the assumption that musicality is maximised.

What will be the consequences of a listener perceiving events in music which are unexpected according to their internalised music theory? An unexpected event implies that the "rules" of music have been broken, and this leads to a prediction (within the listener's brain) that musicality has been reduced (and presumably this prediction will be borne out in most cases). Similarly, if events in music follow the "rules", this will lead to a prediction of musicality, and anticipation of the accompanying pleasure (followed by some kind of disappointment if the expected pleasure doesn't result).

A Possible Experimental Test

One thing about theories is that they are always provisional, and sometimes they fail to be consistent with new facts. A listener exposed to only one type of music will develop an internalised music theory tailored to that genre of music. If the listener is exposed to a new genre of music, their internalised theory will incorrectly predict that the new music is not musical, because it breaks the "rules" defined by their existing internal music theory. But the listener will soon learn that this prediction is wrong, and they will develop a revised theory that includes the new genre as well as the existing genres they are familiar with.

There is a complication with this simple picture, which relates to a requirement for early exposure (i.e. before a "critical period"). If a listener is exposed to a new genre of music at too late an age, they may fail to appreciate it at all. This might be not so much because the "rules" are unfamiliar, but because the brain cells which detect musicality within particular brain regions have never developed (on the "use it or lose it" principle), and the musicality of the unfamiliar genre depends on the occurrence of constant activity patterns within those regions.

However, assuming that a listener can be exposed to music which is outside their previous experience of musical "rules", but not so unfamiliar that they cannot learn to appreciate it, we have a method of applying the "failed expectation" paradigm to music which is musical.

This paradigm can be applied most successfully if we can find music which breaks the expectations defined by a subject's existing internal musical theory suddenly. A simple example might be a listener who has never heard music with accidentals, who then hears for the first time a strong item of music which has an accidental in it. Or a listener could listen to music with syncopation for the first time (where the music starts off unsyncopated), or to music with sudden strong note-bending where they had previously only listened to very "clean" music (see my earlier article about Dina Paucar for some examples of music with unusual note-bending which would normally be "expected" to reduce musicality).

My prediction is that in such a situation one would be able to observe both the signs of failed expectation (for example in EEG output), and simultaneously the signs of musical pleasure. This would constitute a dissociation between perception of musical "syntax" AKA musical "expectations" and perception of actual musicality.

However, in a world with a globalised commercial music industry, it may be difficult to find music listeners and "strong" music that those listeners have never heard. The experimental protocol requires that the subject must not hear the musical item before doing the experiment, and no one will know if indeed the subject can learn to enjoy that item until they do hear it. It will probably be necessary to use multiple subjects and multiple candidate musical items until a response is found that includes both violated expectations and musical enjoyment.

(One way to improve the odds is to exploit language barriers, where certain genres of music are restricted to listeners who speak particular languages. Thus one could find a song which is a hit song in one country X but which is relatively unknown in country Y (where people speak a different language), and which breaks one or more musical "rules" familiar to the inhabitants of country Y. It is then just a matter of measuring the reaction of inhabitants of country Y to the song until a subject is encountered who easily learns to enjoy that song even though it lies outside the categories of music that they are familiar with.)