Commentary on "The Distance Geometry of Music"

30 May, 2007

A commentary on the paper The Distance Geometry of Music (by E Demaine et al) in relation to the super-stimulus theory of music

by Philip Dorrell

The Distance Geometry of Music is a paper by E Demaine et al which discusses the relationship between the Euclidean algorithm and evenness properties of music, in particular evenness of musical scales and musical rhythms.

Analogies and Symmetries

Most of Demaine et al's paper considers the evenness of repeated rhythmic patterns, but there is also consideration of the evenness of musical scales, and these two are linked by an analogy between the repetitive circularity of time within a bar, and the repetitive circularity of a scale within so-called pitch space.

My own theory places importance on the symmetries of music (or more precisely the symmetries of music perception), and it also identifies symmetries between symmetries. The analogy between pitch in an octave and time in a bar given by Demaine et al can be considered an analogy between two symmetries: pitch-translation invariance and time-translation invariance.

However, the super-stimulus theory identifies a different analogy, which is an analogy between pitch-translation invariance and time-scaling invariance. Within this analogy there is no analogy between circularity; indeed pitch-translation invariance has an associated circularity, whereas time-scaling invariance doesn't. (This difference can help explain the different exactness of these two symmetries, i.e. musicality is less changed by pitch translation, because the pitch scale considered modulo octaves has no natural "edge", whereas the non-circular time scale has "edges" consisting of time scales either too short or too long to contribute to the perception of rhythm.)

It should be emphasised that there is not necessarily any conflict between two different analogies. For example, analogy A might represent a correspondence between cortical region P_A responsible for an aspect of pitch perception and cortical region R_A responsible for an aspect of rhythm perception, whereas analogy B might represent a correspondence betweeen different cortical regions P_B and R_B responsible for different aspects of pitch and rhythm perception respectively.

However, it is worth considering the reasons why I consider the analogy between pitch translation invariance and time scaling invariance to be significant:

Both symmetries require a non-trivial implementation in the brain.
Both symmetries reflect required invariances of speech perception which arise out of natural variations in normal speech, i.e. people have higher and lower voices, and people can speak faster or slower.
Each symmetry requires some kind of calibration, and in each case the most obvious mechanism of calibration involves relationships that are emphasised in music. For pitch translation invariance, the calibrating relationship is consonance, and for time scaling invariance it is the occurrence of N repeated beat periods within a longer beat period, where N is generally a small number such as 2 or 3.

Time translation invariance does not quite fit into the same pattern. We can regard it as a "requirement" of speech perception, in the sense that the same speech can occur at different times (i.e. earlier or later), but it is less obvious that implementation is non-trivial, because the only requirement is that the perceptual systems not retain state indefinitely. Also there is no calibration requirement, because time translation invariance is a basic law of physics. Whereas pitch translation and time scaling invariances only apply in certain special situations (and physically pitch translation actually is time scaling, but perceptually it is different because the relevant time scale is different by a large factor).

It is also worth noting that the analogy between pitch circularity modulo octaves and time circularity modulo bars is not 100% because the two circularities arise from different causes: circularity modulo octaves is an intrinsic feature of human pitch perception, whereas circularity modulo bar length is caused by the construction of the music. Bar length can be varied, whereas the size of an octave is fixed.

However, this difference does not rule out the possibility that both types of circularity contribute to musicality for analogous reasons, and indeed the super-stimulus theory can give a plausible reason for why this might happen.

Euclidean Rhythms and Neural Activity Patterns

The major topic of discussion in Demaine et al's paper is that of Euclidean rhythms. These are repeating rhythms in a bar which are considered as a set of possible beats within a bar divided into a particular number of beats, which have a property of maximum evenness.

One particular criterion of evenness considered is that of deepness, or Winograd-deepness. To quote the paper:

A musical scale of rhythm is Winograd-deep if every distance 1,2,...|_n/2_| has a unique number of occurrences (called the multiplicity of the distance).

Assuming, as the authors deduce from the evidence of many different musical cultures, that this deepness property creates musicality, a simple question arises: why does deepness create musicality?

Following the strategy of analysis I used in the development of my super-stimulus theory, we can look for the simplest possible explanation, which is that there is a cortical region which calculates and responds to this measure of multiplicity. This would require basically two steps of calculation:

One cortical map calculates distances between perceived beats (or notes),and responds in proportion to the number of distances, according to a place encoding of distance.
A second cortical map responds to the degree of response, and generates a place encoding of the counted number.

This can be considered a brief explanation of the how; the next question to ask is why, i.e. why does the brain calculate and respond to distance multiplicity in this fashion?

One likely explanation, which relates to the symmetries I discussed above, is that it gives rise to a translation invariant representation of the incoming data, i.e. each beat (or note) is encoded by a measure of its relative multiplicity, and, given a couple of extra assumptions (which I will discuss shortly), this encoding is time-scaling or pitch-translation invariant.

The super-stimulus requires us to ask a third question, which is how does this apply to normal speech, given that normal speech lacks the regularity and discreteness of music? It is easier to consider the case of rhythm first, given that speech rhythm is still intrinsically discrete, even though it lacks the regularity of musical rhythm.

If we consider the multiplicities of distances between irregular beats, then we will find that no distance is exactly the same as any other distance, and the issue of counting is potentially problematic. Some kind of smoothness or fuzziness is required. As it happens, we get this automatically in the brain, because information about perceptual quantities is almost always population encoded. Thus, the occurrence of a value X counts as f(X,Y) occurrences of Y, where f is some function which is larger for "close" values of X and Y, and smaller for "distant" pairs of values.

At this point I can raise one of the extra assumptions required, which is that the function f itself should ideally be translation invariant under the relevant translation. (And the second assumption has to do with "cutoff", i.e. there should be some limit to the values entered into the original distance calculation, and the mechanism of limitation must itself be a function of the input data in a translation-invariant manner. A plausible candidate in the case of rhythm might be to ignore all "distances" greater than a bar length, where bar length can be robustly identified as the shortest and most frequent distance occurring between the strongest beats.)

Given that the brain is measuring multiplicity in this way, it is not hard to see how this gives rise to the "constant activity patterns", which according to the super-stimulus theory, are the prerequisite for perceived musicality. In particular, given that there is a finite number of multiplicities, there will be an active zone in the "multiplicity map" for each occurring multiplicity, and inactive zones elsewhere. By maximising the number of different multiplicities, the number of active zones is maximised, which maximises the extent of the border regions between the active zones and the inactive zones, which, according to the super-stimulus theory, maximises the perceived musicality.