Composing Music is Like Climbing Mountains in the Fog

28 May, 2011
Composing music is an optimisation problem, like searching for the peak of a very high mountain.

Climbing Mountains in a World Where it is Always Foggy

Normally, when people climb a mountain, they already know where the mountain's peak is.

But imagine if you lived in a world where it was always foggy, and you could hardly see beyond the next step on the path. Climbing would still be possible, if you happened to find places where the path went upwards. But because it's always foggy, noone knows where the highest mountains are. Even if you reached what appeared to be a peak, very high up, you would never know that there wasn't some much higher peak quite close by.

Now let's also imagine that, in this world, climbers have GPS units (to know where they are), and altimeters (to know how high they are).

Having GPS and altimeters doesn't solve the problem of finding a mountain peak, but it does solve the problem of finding it the second time.

In this world, noone knows how high the highest mountain is, at least not until mountaineers have explored every square metre (so let's pretend that the world is too large and mountaineers too few for this to happen very quickly). But, to be a great mountaineer, you don't have to climb the highest mountain, or even a mountain that comes close to being the highest mountain. All you have to do is climb a mountain higher than any mountain that anyone else has climbed.

And if you want to substantiate your claim to be the first to climb such a high peak, just record the position from your GPS and the height from your altimeter. Anyone who wants to verify that you climbed a high mountain can use the GPS to get to the same peak, and then check the altimeter reading.

What Does This Have to do with Music?

We can think of mountain climbing in an unexplored foggy mountain range as an optimisation problem. We have an independent variable, which is position on a 2-dimensional surface (as measured by the GPS), and a dependent variable, height, which we want to optimise, i.e. find the highest possible value.

We can also regard music composition as an optimisation problem. The independent variable is position within the set of all possible sounds as recorded over some finite time interval. The dependent variable is that of "musicality", i.e. how "good" or "strong" the music is.

The idea of a foggy world corresponds knowing the local behaviour of a function in a small neighbourhood of a particular point in the dependent set, but not knowing anything else about the global behaviour. In such a world, the only way to find a mountain peak is to incrementally climb to a higher position. Occasionally you will reach a local maximum, and to find a higher maximum you will have to go down from where you are, looking for an alternate path upwards.

This model of an optimisation problem could explain some of what we know (or think we know) about music composition, even though it's probably a vast simplification (and I will discuss some of the complications later):

The Helicopter Analogy

Performing a known musical item is like flying a helicopter to the top of Mount Everest. You get to see what the top of Mount Everest looks like, but it teaches you very little about how to climb Mount Everest, and it teaches you even less about how to find some mountain that hasn't been discovered yet.

You can learn to perform any number of known musical items, making sure to play the notes exactly correctly, and yet learn nothing at all about how to compose new music.

However, if you live dangerously, and start to improvise, you might at least learn something. That's like being flown in a helicopter not quite to the top of Mount Everest, and getting out, and then seeing if you can climb it yourself.

But you might just as easily learn to discover new mountains by starting way at the bottom, wandering at random, and climbing whatever hills or mountains you come across.

Global versus Local Comparisons

What's the highest mountain in the world? Is it Everest, the highest above sea-level? Is it Chimborazo, the furthest from the center of the Earth?

Although different global criteria of height can be applied, there is less room for disagreement about local criteria. In particular, whether or not we use height above sea-level or distance from the center of the Earth, there is no disagreement about which part of Everest is the highest, or which part of Chimborazo.

Something similar happens with music, where people may disagree about which music is the "best", but, if we consider minor variations of a single item, i.e. different performances, then there will be much more agreement about which variation is the best variation. A corollary of this observation is that there is more agreement about who is a "good" musician than there is about what is "good" music.


One place where the analogy between music and mountains breaks down is dimensionality.

As I observed above, mountains are defined by height as a function of position in a two dimensional space (where to be precise, the space consists of a 2-dimensional manifold which is equivalent to the surface of a sphere).

Music exists in a much higher dimensional space. If we consider the set of all possible sounds recorded of a given interval of N seconds, with sounds ranging up to a frequency f, then the dimensionality of the space is approximately Nf, which is a very large number.

One thing about very high-dimensional spaces is that even surveying a local neighbourhood can be a lot of work. For example, even if a function F of an n-dimensional space is relatively "smooth", to determine the direction of maximum increase will require something of the order of n observations.

So in a relatively smooth neighbourhood in a 2-dimensional surface, you only have to look in two different directions to figure out which way is the fastest way "up". Whereas in a million-dimensional space, you have to look in a million different direction, which could take a long, long time. To make progress in such a space, you will probably only "look" in a smaller number of directions, until you find one that goes "up" somewhat faster than the others that you've looked at so far. But having made such a choice, there could have been another direction which would have taken you "up" really, really fast, but it was in the hundreds of thousands of directions that you never bothered to look in.

Cutting Down on Dimensions, Maybe

One thing about the high-dimensionality of musical "sound space" is that almost anything we think we know about music can reduce the number of dimensions, quite significantly.

For example, if we know that music consists of notes from a scale, occuring on beats as defined by a discrete time signature, then the number of possibilities is much reduced. The number of dimensions is still large, but we might have reduced it from the millions down to the hundreds or thousands.

Of course for any supposed rule of music which reduces dimensionality, we don't actually know that said rule is a necessary rule of music. Given our general ignorance of what music is, it is entirely possible that musical items exist which are radically different to the music we already know, and such musical items may not follow whatever rules we think are the rules of music.

If follows that our existing knowledge of music results from a rather ad-hoc search of a very very tiny portion of a very very large space of possible values, and until we know more about what music actually is, we should not be at all surprised to discover new kinds of music which are quite different from and/or much "better" than all the music that we know about already.

Sub-Space Factoring

Another feature of the dimensionality of music is "aspects". A simple example of distinct musical aspects would be melody and rhythm (we might also consider there to be more specific aspects contained within these two general aspects, but for the purpose of the following explanation it will be enough to consider just melody and rhythm).

In as much as these aspects can be optimised separately, we can consider our optimisation problem to be broken up into several distinct sub-problems.

Even if the separation is only approximate, this can still allow for search strategies that involve more than observation of the musicality function's behaviour in a local neighbourhood.

For example, if melody A combines well with rhythm B, and melody C combines well with rhythm D, then it's certainly worth considering A + D and C + B as good starting points in the search for new musical items.

If such an analogy applied to searching for high mountains, it would be like taking the longitudes and latitudes of different high mountains, and combining them to discover likely locations of other high mountains. (But this strategy won't work in practice, because mountains don't form lattice-like patterns.)