March 2014


Options: Use Monospaced Font
Show HTML Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Lisa Nolder <[log in to unmask]>
Reply To:
Lisa Nolder <[log in to unmask]>
Tue, 25 Mar 2014 10:37:25 -0400
text/plain (2203 bytes) , text/html (28 kB)
*_Notice and Invitation_*
Oral Defense of Doctoral Dissertation
The Volgenau School of Engineering, George Mason University

*Yun-Sheng Wang*
Bachelor of Science, Feng Chia University, 1991
Master of Science, George Mason University, 2002


*Unsupervised Bayesian Musical Key and Chord Recognition*

Wednesday, 04/09/2014, 1:00pm
Room 4801, Engineering Building

All are invited to attend.

Dr. Harry Wechsler, Chair
Dr. Jim Chen
Dr. Jessica Lin

Dr. Andrew Loerch

Dr. Pearl Wang


Many tasks in Music Information Retrieval can be approached using 
indirection in terms of data abstraction. Raw music signals can be 
abstracted and represented by using a combination of melody, harmony, or 
rhythm for musical structural analysis, emotion or mood projection, as 
well as efficient search of large collections of music. In this 
dissertation, we focus on two tasks: analyzing tonality and harmony of 
music signals. Our approach concentrates on transcribing western popular 
music into its tonal and harmonic content directly from the audio 
signals. While the majority of the proposed methods adopt the supervised 
approach which requires scarce manually-transcribed training data, our 
approach is unsupervised where model parameters for tonality and harmony 
are directly estimated from the target audio data. First, raw audio 
signals in the time domain are transformed using undecimated wavelet 
transform as a basis to build an enhanced 12-dimensional pitch class 
profile (PCP) in the frequency domain as features of the target music 
piece. Second, a bag of local keys are extracted from the frame-by-frame 
PCPs using an infinite Gaussian mixture which allows the audio data to 
"speak-for-itself" without pre-setting the number of Gaussian components 
to model the local keys. Third, the bag of local keys is applied to 
adjust the energy levels in the PCPs for chord extraction.

 From experimental results, we demonstrate that our approach -- a much 
simpler one compared to most of the existing methods -- performs just as 
well or outperforms many of the much more complex models for the two 
tasks without using any training data.

A copy of this doctoral dissertation is on reserve at the Johnson Center