Abstract:
In this talk, I will present two contrasting approaches to building generative models for audio with deep learning techniques:
1) In the TimbreTron system (developed in collaboration with students and faculty at U of Toronto and Vector), we learn to manipulate the timbre of a sound sample from one instrument to match that of another while preserving musical content such as pitch and rhythm. I will describe how we do this by combining a CycleGAN architecture, appropriate spectral representations and a conditional WaveNet synthesizer.
2) In PerformanceRNN (developed in collaboration with researchers at Google, and with subsequent developments in collaboration with undergraduate students at Dalhousie FCS), we work directly with MIDI data rather than raw audio, which allows us to treat music generation as language-modeling problem. We use a conditional LSTM to generate solo piano music based on a dataset of human performances.
I will also provide overviews as needed throughout the talk of concepts related both to audio generation (e.g. "What is MIDI? What are spectral representations?") as well as to deep learning techniques (e.g. "What is CycleGAN?”).
Brief Bio:
After completing his undergraduate degree in mathematics & CS at Dalhousie University in Halifax, Sageev went on to study at the University of Toronto where he completed his Master’s and PhD in Computer Science under the supervision of Geoffrey Hinton. Sageev’s research interests include deep learning with a focus on generative models and computational creativity. As a pianist, Sageev’s has performed as soloist with symphony orchestras and at jazz festivals across the country. As both a scientist and musician, Sageev is also interested in the interaction between human performance and adaptive systems. From 2016-2018, Sageev was a Visiting Research Scientist at Google Brain, working on the Magenta team. In February 2018 he joined both the Vector Institute (Toronto) and Dalhousie University as a faculty member.
Host: Sageev Oore (sageev@dal.ca)