According to my theory of music, music is a super-stimulus for the perception of musicality, where musicality is a perceived aspect of speech.
An immediate consequence of this theory is that there should be analogies between aspects of speech and aspects of music. In fact we can divide the various aspects into three main groups:
Most of the development of my theory is concerned with reconciling differences observed between the aspects found in both music and speech, and in explaining the presence of aspects found in music but apparently not in speech. In both cases the reconciliation is achieved by supposing the existence of cortical maps that process the musical aspects, and then demonstrating that the same cortical maps have a plausible role in the processing of speech.
The issue of aspects in speech but not in music can be dealt with simply by supposing that musicality is a perceived property of speech which is a function of only some aspects of speech, and that the aspects not included in the calculation of musicality happen to include phonemes, vocabulary syntax and semantics.
But there may be more specific reasons why these aspects do not appear in the calculation of musicality. To start with, aspects such as vocabulary, syntax and semantics may be part of "higher order" processing, whereas the aspects of speech sound relevant to musicality represent the more fundamental properties of the sounds of speech.
This still leaves phonemes to be dealt with, and in particular the distinctions between different vowels and between different consonants, which appear to have no musical significance.
One feature of phonemes is that they are automatically discretised as part of the perception of speech. We may compare this to the case for pitch values, which are discretised in music, but not in speech. In general we might expect that maximising the musicality of a speech aspect often causes the aspect to become discrete in music where it is not discrete in speech. But if an aspect (such as phonemes) is already discrete in speech, then there is no opportunity to increase the musicality of that aspect in music by making it more discrete, since it is already as discrete as it can possibly be.
There is another possible explanation for the irrelevance of phonemes to musicality, which comes from consideration of the evolutionary history of language. When the perception of musicality evolved, phonemes may not have been a component of human language. The perception of musicality depended on those components of speech sounds which existed at the time, and those components corresponded to what we now call melody and rhythm.
Note that this explanation does not imply that early language consisted of music. Music at that time, as it is now, was the super-stimulus for perception of musicality of speech, so the perceived musicality of speech was (as it is now) much less than that of music.
We must also presume that the evolution of the perception of musicality preceded the development of music (since you cannot create music until you know subjectively what is and what is not musical).
Recent studies on whistling languages may be relevant to consideration of this explanation. The most famous example of such a language is "Silbo Gomero" (and see the CNN story), which is "spoken" by a very small population of users on Gomera Island in the Canary Islands. Silbo Gomero (or "Silbo" for short), is derived from Spanish by replacing vowels and consonants with particular whistling sounds. It would appear from the description that Silbo is an alternative representation of phonemes as whistles, and not a truly independent language.
A recent fMRI study has demonstrated that the brains of Silbo practitioners respond to Silbo similarly to how most people respond to spoken language, and differently to how most people respond to Silbo (using areas not specifically involved in language perception, and probably just relating to the perception of sound in general). This study demonstrates that the human brain is certainly capable of responding to a melodic language. Of course we should not be too surprised at this, given that humans can respond to language forms significantly different from speech, such as sign language and written language.
A point worth noting is that Silbo does not sound particularly musical, emphasising the distinction between music and speech even for a language all of whose components are also components of music. (There is a sample available at http://www.agulo.net/silbo/silbo1.htm, and also a larger MP3 sample, which appears to have gone offline, perhaps as a result of over-popularity.)
Other more general evidence for the plausibility of a prehistoric melodic/rhythmic language comes from tone languages, such as Mandarin, where pitch values are components of individual words.
It is not too implausible that the first human languages were tone languages which defined words entirely by pitch values. Music contains analogs of voiced sounds in the melody and analogs of consonants in the percussion, which suggests that these early languages contained one vowel and one consonant, and that they had syllables whose boundaries were defined by occurrences of the consonant. Such languages may have had a smaller vocabulary than modern languages; as a need for more words and greater speed of transmission arose, vowels and consonants evolved so that more information could be packed into a given length of sentence.