[Amath-seminars] A signal processing talk -- Feb 26

Eric Shea-Brown etsb at washington.edu
Fri Feb 21 11:46:42 PST 2014

Scattering Invariants for Audio Classification
Joakim Andén

February 26, 2014, 10:30 – 11:20 A.M., EE Building Room 303

To obtain efficient feature representations for audio classification, it is desirable to have invariance to time-shift and stability to time-warping. The commonly used Mel-frequency cepstral coefficients (MFCCs) satisfy these criteria, but are unsuitable for modeling large-scale temporal structure. The scattering transform extends this representation through a convolutional network of wavelet transforms and modulus operators, capturing structures at larger time scales. Additional invariance to frequency transposition with stability to frequency-warping is obtained by applying a second scattering transform along the log-frequency axis. Using these representations, we obtain state-of-the-art results on tasks such as phone segment classification and musical genre classification on the TIMIT and GTZAN datasets, respectively.

Joakim Andén is a Ph.D. candidate in applied mathematics at Ecole Polytechnique in Paris, France under the supervision of Prof. Stéphane Mallat. Previously, he studied engineering physics and mathematics at the Royal Institute of Technology in Stockholm, Sweden and fundamental mathematics at Université Pierre et Marie Curie in Paris, France, from which he received an M.Sc. in 2010. His research focuses on invariant signal representations and their applications to classification and similarity estimation for speech, music and environmental sounds as well as medical signals.

Prof. Les Atlas
Bloedel Research Scholar
Department of Electrical Engineering

More information about the Amath-seminars mailing list