Music is a superstimulus for speech.
But it is not a positive superstimulus. It is a negative superstimulus, which causes at least one processing step in the normal processing of conversational speech to be completely suppressed.
This suppression, indirectly, results in the apparent positive effects of music on the listener's brain.
Two suggestions for how to deep learn music better:
1. Solve an easier problem, with more limited scope: select a single backing track, and create a dataset of improvisations against that backing track.
2. Start with a hypothesis about what music is, or might be, and use that to drive the deep learning strategy.
Music is a superstimulus for the perception that a speaker's utterance lacks conversational value, and therefore the listener should not bother making any immediate effort to evaluate the truth value of that utterance.
The effect of music on a listener is to interrupt the normal sequence of information processing steps that apply to conversational speech.
This interruption leaves the listener's brain in an intermediate state, where the meaning of an utterance is determined, and the hypothetical emotional significance of the utterance is determined, but the listener has made no attempt to determine their beliefs about the truth value of the utterance.