The Distance Geometry of Music is a paper by
E Demaine *et al* which discusses the relationship between the Euclidean algorithm
and *evenness* properties of music, in particular evenness of musical scales and musical
rhythms.

The purpose of this commentary is not so much to review all the theories and ideas explored in the paper, but rather to consider its possible relationship to my own super-stimulus theory of music.

## Analogies and Symmetries

Most of Demaine *et al*'s paper considers the evenness of repeated rhythmic patterns, but there is also
consideration of the evenness of musical scales, and these two are linked by an analogy
between the repetitive circularity of time within a bar, and the repetitive circularity
of a scale within so-called *pitch space*.

My own theory places importance on the *symmetries* of music (or more precisely
the symmetries of music perception), and it also identifies symmetries between symmetries.
The analogy between pitch in an octave and time in a bar given by Demaine *et al*
can be considered an analogy between two symmetries: *pitch-translation invariance*
and *time-translation invariance*.

However, the super-stimulus theory identifies a different analogy, which is an analogy
between pitch-translation invariance and time-*scaling* invariance. Within this analogy
there is no analogy between circularity; indeed pitch-translation invariance has an associated
circularity, whereas time-scaling invariance doesn't. (This difference
can help explain the different *exactness* of these two symmetries, i.e. musicality
is less changed by pitch translation, because the pitch scale considered modulo octaves has
no natural "edge", whereas the non-circular time scale has "edges" consisting of time scales
either too short or too long to contribute to the perception of rhythm.)

It should be emphasised that there is not necessarily any conflict between two different
analogies. For example, analogy A might represent a correspondence between cortical region
P_{A} responsible for an aspect of pitch perception and cortical region R_{A}
responsible for an aspect of rhythm perception, whereas analogy B might represent a correspondence
betweeen *different* cortical regions P_{B} and R_{B} responsible
for *different* aspects of pitch and rhythm perception respectively.

However, it is worth considering the reasons why I consider the analogy between pitch translation invariance and time scaling invariance to be significant:

- Both symmetries require a non-trivial implementation in the brain.
- Both symmetries reflect required invariances of speech perception which arise out of natural variations in normal speech, i.e. people have higher and lower voices, and people can speak faster or slower.
- Each symmetry requires some kind of
*calibration*, and in each case the most obvious mechanism of calibration involves relationships that are emphasised in music. For pitch translation invariance, the calibrating relationship is*consonance*, and for time scaling invariance it is the occurrence of*N*repeated beat periods within a longer beat period, where*N*is generally a small number such as 2 or 3.

Time translation invariance does not quite fit into the same pattern. We can regard
it as a "requirement" of speech perception, in the sense that the same speech can occur at different
times (i.e. earlier or later), but it is less obvious that implementation is non-trivial, because the only requirement
is that the perceptual systems not retain state indefinitely. Also there is no calibration
requirement, because time translation invariance is a basic law of physics. Whereas
pitch translation and time scaling invariances only apply in certain special situations
(and physically pitch translation actually *is* time scaling, but perceptually it is
different because the relevant time scale is different by a large factor).

It is also worth noting that the analogy between pitch circularity modulo octaves and time circularity modulo bars is not 100% because the two circularities arise from different causes: circularity modulo octaves is an intrinsic feature of human pitch perception, whereas circularity modulo bar length is caused by the construction of the music. Bar length can be varied, whereas the size of an octave is fixed.

However, this difference does not rule out the possibility that both types of circularity contribute to musicality for analogous reasons, and indeed the super-stimulus theory can give a plausible reason for why this might happen.

## Euclidean Rhythms and Neural Activity Patterns

The major topic of discussion in Demaine *et al*'s paper is that of
*Euclidean rhythms*. These are repeating rhythms in a bar which are considered
as a set of possible beats within a bar divided into a particular number of beats,
which have a property of maximum *evenness*.

One particular criterion of evenness considered is that of *deepness*,
or *Winograd-deepness*. To quote the paper:

A musical scale of rhythm isWinograd-deepif every distance 1,2,...|_n/2_| has a unique number of occurrences (called themultiplicityof the distance).

Assuming, as the authors deduce from the evidence of many different musical cultures, that
this deepness property creates musicality, a simple question arises: *why* does deepness create musicality?

Following the strategy of analysis I used in the development of my super-stimulus theory, we can look for the simplest possible explanation, which is that there is a cortical region which calculates and responds to this measure of multiplicity. This would require basically two steps of calculation:

- One cortical map calculates distances between perceived beats (or notes),and
responds in proportion to the number of distances, according to a
*place encoding*of distance. - A second cortical map responds to the degree of response, and generates a place encoding of the counted number.

This can be considered a brief explanation of the *how*; the next question to ask is
*why*, i.e. why does the brain calculate and respond to distance multiplicity in this fashion?

One likely explanation, which relates to the symmetries I discussed above, is that it gives rise to a translation invariant representation of the incoming data, i.e. each beat (or note) is encoded by a measure of its relative multiplicity, and, given a couple of extra assumptions (which I will discuss shortly), this encoding is time-scaling or pitch-translation invariant.

The super-stimulus requires us to ask a third question, which is how does this apply to normal speech, given that normal speech lacks the regularity and discreteness of music? It is easier to consider the case of rhythm first, given that speech rhythm is still intrinsically discrete, even though it lacks the regularity of musical rhythm.

If we consider the multiplicities of distances between irregular beats, then we will find that
no distance is exactly the same as any other distance, and the issue of counting is potentially problematic.
Some kind of smoothness or fuzziness is required. As it happens, we get this automatically in the brain,
because information about perceptual quantities is almost always *population encoded*. Thus,
the occurrence of a value *X* counts as *f(X,Y)* occurrences of *Y*, where *f*
is some function which is larger for "close" values of *X* and *Y*, and smaller for "distant"
pairs of values.

At this point I can raise one of the extra assumptions required, which is that the
function *f* itself should ideally be translation invariant under the relevant translation.
(And the second assumption has to do with "cutoff", i.e. there should be some limit to the values
entered into the original distance calculation, and the mechanism of limitation must itself be
a function of the input data in a translation-invariant manner. A plausible candidate in the case
of rhythm might be to ignore all "distances" greater than a bar length, where bar length can be
robustly identified as the shortest and most frequent distance occurring between the strongest beats.)

Given that the brain is measuring multiplicity in this way, it is not hard to see how this gives rise to the "constant activity patterns", which according to the super-stimulus theory, are the prerequisite for perceived musicality. In particular, given that there is a finite number of multiplicities, there will be an active zone in the "multiplicity map" for each occurring multiplicity, and inactive zones elsewhere. By maximising the number of different multiplicities, the number of active zones is maximised, which maximises the extent of the border regions between the active zones and the inactive zones, which, according to the super-stimulus theory, maximises the perceived musicality.