After doing some digging into the code and resolving an error that caused the network to devolve into white noise, then talking with some other folks about what seemed to work for them, along with a whole bunch of hyper-parameter optimization, I’ve had some encouraging results!
For these, I’ve restricted the training to one speaker. Each successive test represents one round of hyper-parameter optimization, and for the last one, I switched to SGD with momentum as the optimizer instead of ADAM with normalization.
It is also very interesting to note that the most successful test, test 7, was also the smallest of the networks used of these tests, and trained for the shortest time– only 26,000 iterations instead of 50,000, 100,000 and 150,000 for tests 6a,b,&c. My next test will be to continue training on this network with a reduced learning rate to see if I can get it even better, but I’m really happy with these results.
My eventual goal is to get this running with some of my music to see what it spits out.