Forced Alignment on Spontaneous Speech
In this task, systems are required to align audio sequences of spoken dialogues (map task) to the provided relative transcriptions. The task has to be considered speaker independent. Two subtasks are defined, and applicants may choose to participate in any of them:
- phone segmentation;
- word segmentation.
Two modalities are allowed:
- closed: only distributed data are allowed for training and tuning the system;
- open: the participant can use any type of data for system training, declaring and describing the proposed setup in the final report.
The evaluation is based on Unit Boundary Positioning Accuracy. The evaluation methodology will follow the standard described in the documentation of NIST SCLite evaluation tool. Training and development material extracted from wide-band (22050 Hz) corpora will be provided.
Training data: about 15 map task dialogues recorded by couples of speakers exhibiting a wide variety of Italian variants. Dialogues length ranges from 7/8 to 15/20 minutes. It is up to participants to split these data in train and DEV sub-sets. For each dialogue the following files will be provided:
- full dialogue audio files: PCM-encoded stereo WAV files, with the two speakers recorded on different channels;
- full dialogue manually performed transcriptions;
- single turn audio files: PCM-encoded mono WAV files. Each file is referenced to turns into the full transcription by means of its name;
- single turn phonetic labelling: (ext. PHN, ASCII, TIMIT format);
- single turn word labelling: (ext. WRD, ASCII, TIMIT format).
Test data: samples from various dialogues for a total length of about 8/10 minutes.
Distributed data are distributed under some form of public license.
Organizers: Francesco Cutugno (Naples), Dino Seppi (Leuven), Antonio Origlia (Naples)