Online Music Perception Tests and the Perception of Musicality

23 July, 2008

I compare various online music perception tests to what I believe would be the most scientifically interesting music perception test: a musicality perception test.

by Philip Dorrell

The Tests

Some of the better known music perception tests on the web are:

The Amusia Online Test by Isabelle Peretz
The Tonedeaf Test by Jake Mandell
The Delosis Music Listening Test

The tests on these websites test different aspects of musical "ability". The tests can be summarized as follows:

Amusia Online Test: detection of out-of-time or out-of-tune notes
Tonedeaf: melody discrimination, pitch discrimination, rhythm discrimination and structure recognition
Delosis Music Listening Test: melody discrimination

Each of these tests is a test for some component of music perception. I would claim, however, that none of these tests is the most direct possible test for musical perception, because music perception is primarily the perception of musicality.

Testing the Perception of Musicality

Musicality is a property of music consisting of how "good" or how musical a piece of music is. A simple and straightforward design for a test of musicality perception is the following:

Present pairs of musical items
Ask the subject to choose which item of each pair is the most musical
Mark the subject according to how many times they choose the "right" answer

One obvious problem with this test is that it is in danger of becoming a "test" of musical taste, and musical taste is known to vary from person to person. If I present Beethoven's Moonlight Sonata and Deep Purple's Smoke on the Water as a pair, which one is the "correct" answer?

A second more technical difficulty is that some amusic subjects may "cheat". For example, if a familiar tune is presented together with an unfamiliar tune composed by the test designer, then a reasonable guess is that the familiar tune is the "better" tune, because it is very unlikely that a random music scientist can compose music which is better than well-known popular music.

These problems can be dealt with by making the pairs of tunes as similar as possible in all respects other than their musicality. In particular, for each pair of tunes:

the tunes should be played on the same instruments by the same performers,
the tunes should belong to the same "genre",
the tunes should have similar structure, similar rhythm, and similar chord sequences,
the tunes should be composed by the same person.

It may be necessary to adopt a specific methodology for creating the "good" and "bad" tunes in each pair. It may be difficult to find anyone both willing and able to construct commercial-quality music solely for the purpose of a scientific experiment. However there are many people with sufficient musical talent to improvise effectively on a given rhythm and/or chord sequence, and this may be sufficient for the purpose of designing a musicality perception test.

A "bad" tune can be derived from a "good" tune by taking advantage of the ineffectiveness of existing theories of music:

Compose the "good" tune by a combination of improvisation and refinement.
Write down the "good" tune in musical notation.
Get someone else familiar with music theory to analyse the composed music, and compose a similar tune purely by notating music, without making any attempt to "hear" either the "good" tune or the new one they are composing.
Compare performances of the two tunes to verify their degree of similarity in all aspects other than their musicality.

Scoring

The simplest scoring method for a musicality perception test is to give 1 for the "right" answer and 0 for the "wrong" answer for each pair of tunes.

However there is a risk of spurious bias due to the specific musical taste of the test designer.

For the most part we judge other people as "understanding" music if they perceive the musicality in the same tunes that we do. This can be developed into a scoring method by comparing each subject's choice with that of everyone else. With a large enough sample of subjects, this will remove any dependence of the scoring on arbitrary choices made by the test designer as to what is a "good" tune or what is a "bad" tune.

Scores can be adjusted according to the ratio of choices for each pair of tunes by calculating the entropy, i.e. log p/(1-p) for each tune choice, where p is the proportion of test subjects who made the same choice as the subject being scored. This means that where there is no strong agreement about which tune is the better of a pair (i.e. p = 0.5), the score difference between the "right" answer and the "wrong" answer will be very small. Whereas if there is very strong agreement about which of two tunes is the best, then a subject will be penalised by a greater amount for choosing the "wrong" answer.

The Search for "True" Amusia

All of the tests mentioned above have some intention of identifying amusia (especially the "Amusia Online Test"). As I have explained in my previous blog article, a deficit in some perceptual ability which is a component of music perception is not the same thing as a deficit in music perception per se.

A test for musicality perception would be the most direct test of music perception, and would have the potential to tell us more about the mystery of music than any other kind of music perception test.

Furthermore, tests of more specific perceptual abilities could be used as counter-tests. In other words, the people that music scientists should be most interested in studying are those people who fail the musicality perception test, but who still pass tests of such things as pitch discrimination, melody recognition, rhythm recognition and structure recognition.