In the last few years the automatic speech recognition (ASR) technology has achieved remarkable results, mainly thanks to increased training data and computational resources. However, ASR trained on thousand hours of annotated speech can still perform poorly when training and testing conditions are different (e.g., different acoustic environments). This is usually referred to as the mismatch problem.
In this challenge participants will have to build a speaker-dependent phone recognition system that will be evaluated on mismatched speech rates. While training data consists of read speech where the speaker was required to keep a constant speech rate, testing data range from slow and hyper-articulated speech to fast and hypo-articulated speech.
The training dataset contains simultaneous recordings of audio and vocal tract (i.e., articulatory) movements recorded with an electromagnetic articulograph. Participants are encouraged to additionally use the training articulatory data to increase the generalization performance of their recognition system.
Detailed guidelines, task materials and data sets for development, training and testing will be made available on the Task Website.
- Leonardo Badino (CTNSC, Istituto Italiano di Tecnologia)
- Francesco Cutugno (PRISCA Lab, Universita’ degli Sudi di Napoli Federico II)
- Bertrand Higy (iCub Facility, Istituto Italiano di Tecnologia)
Leonardo Badino, leonardo.badino[AT]iit.it