Abstract:
Music is a universal component of human culture. Music scenes are carefully designed by composers and performers to contain rich patterns and structures that make music enjoyable. Music is also a multimodal art form. Humans integrate audio, visual and score information to parse a musical scene. In this talk, I will present our work on designing algorithms to automatically analyze music scenes through multiple modalities. From the audio modality itself, we design automatic music transcription systems that can convert piano music audio into music notation with a high accuracy. When scores are available, we design systems that can separate sound sources from the music mixture in real time by leveraging the score information. When visual information is available, we propose methods that can associate sound sources with players in the visual scene and integrate audio, visual and score information toward music performance analysis.
Bio:
Zhiyao Duan is an assistant professor and director of the Audio Information Research (AIR) lab in the Department of Electrical and Computer Engineering at the University of Rochester. He received his B.S. in Automation and M.S. in Control Science and Engineering from Tsinghua University, China, in 2004 and 2008, respectively, and received his Ph.D. in Computer Science from Northwestern University in 2013. His research interest is in the broad area of computer audition, i.e., designing computational systems that are capable of understanding sounds, including music, speech, and environmental sounds. Specific problems that he has been working on include automatic music transcription, audio-score alignment, source separation, speech enhancement, sound retrieval, and audio-visual analysis of music. He has published 40 peer-reviewed journal and conference papers. He co-presented a tutorial on automatic music transcription at the ISMIR conference in 2015. His research is funded by the National Science Foundation.