The Reason Why Computers Can't Compose Good Music

8 February, 2021
Machine learning algorithms cannot currently learn to compose good music.
Why not? Because they don't have access to real-time perception of the musical quality of a composition-in-progress.

Machine Learning Music Composition, from Examples

Machine Learning is a way to write computer programs where you don't actually write a program to solve your problem, instead you write a learning program, and when you run the learning program on your computer, the program learns how to solve the problem that you want to solve.

When a learning program runs, it has to learn from something – and this is critical to how the program works, and whether it will work at all.

When it comes to musical composition, most attempts at solving this by machine learning, as far as I know, are example-based.

That is, the information fed into the learning algorithm consists of examples of what are presumably good quality items of music.

We don't yet live in a world where all our music is composed by computer. The implication is that this example-based learning approach doesn't work, and perhaps there is something about musical composition that is extra difficult to do by machine learning.

How Humans Compose Good Music

How do humans compose good music?

There are people who compose good music, and we know this, because good music exists, and human composition is the only known process for composing good music.

But just because some people compose good music, some of the time, that doesn't mean we know a whole lot about how such music is composed.

Composers can tell us about how they might have composed music, after the fact, but even if we assume an honest recollection of what happened during the composition of particular items of music, it is difficult to reliably verify any such recollection.

I would like to suggest that one reason why computers cannot learn music composition from examples of good music is that not even human composers can learn composition that way.

In particular, we can observe that the world is full of people who have been exposed to multiple examples of good music, usually with multiple occurrences of the best examples, and also there are quite a lot of people who have been exposed to multiple examples of good music, and they are musicians, and they can play one or more instruments, and they can reliably play good versions of those music examples, and yet most of those musicians cannot compose new original examples of good music.

The Importance of Real-Time Feedback

In my own personal experience, knowing multiple examples of good music and being able to play those examples on an instrument does not get one anywhere near being able to compose new examples of good music. (I will not claim any degree of virtuosity in my instrument playing, but it's probably good enough for alleged examples of good quality original compositions to be recognisable as such.)

What I have found, by personal experience, is that one can learn to compose music, but this requires exposure to musical compositions in the process of being composed (and see ../melodies/index.html for a few examples of my own compositions).

This is where the computer is at a total disadvantage, because musical compositions in the process of being composed are of varying quality, and the learning process necessarily depends on the human composer having real-time perception of the quality of the musical composition in progress, as it is being played on the instrument.

It's not that the computer program might not be capable of learning to compose based on information about suitable inputs and outputs, but it can't learn this, because it doesn't have access to real-time information about the quality of the current state of a musical composition that might be very musical, or just somewhat musical, or maybe not very musical at all.

This then, is why computers can't yet compose good music, even though it's something that people can do.

How to Fix the Problem

Of course if the problem is that computers can't learn to compose good music, because they lack the necessary real-time feedback, then we can immediately think of a possible solution, which is to find some way to give real-time feedback to a music-composition learning program.

Unfortunately we don't (yet) have any reliable means of objectively measuring of the perception of musical strength by a human music listener.

The best we can do is perhaps ask a listener to consciously give some indication of the perceived musicality from the output of a music-composition learning program running on a computer.

And we might be able to improve on this slightly by asking a suitable number of listeners to give real-time feedback to the same output, for example via Amazon Mechanical Turk, or some other method of recruiting large numbers of people online to assist in a machine-learning project. Involving large numbers of people to give feedback would help to smooth out any irregularities and inconsistencies in feedback that would be received from a single listener (especially if we're expecting the learning process to carry on for a protracted period of time).

And Another Thing

There is a second thing we could do to improve the chances of success, and it relates to something you might have noticed if you went and listened to my own musical compositions that I linked to above, which is that I did not actually compose the entire musical items – rather I composed melodies against existing backing tracks, and I composed them by using a screen-based keyboard which pre-constrained the notes to that of a chosen scale.

This makes the problem easier, because it's a simpler more constrained problem, ie just find the optimal melody from the notes on a chosen scale against a pre-chosen backing track.

It's also an approach that doesn't work well with example-based learning, because for any one particular backing track that one finds on YouTube, the number of known musical hits based on that backing track might be a very small number, or even zero.

But with an approach that relies on real-time feedback about the quality of the composition, as the composition is in the process of being composed, the lack of any pre-existing good examples does not matter.