Forced Alignment on Children Speech (FACS)

Following EVALITA 2011 task on “Forced Alignment on Spontaneous Speech” focused on adult speech, we propose this year the same task focused on children speech. Systems are required to align audio sequences of children speech to the provided correspondent transcriptions. The task has to be considered speaker independent. Two subtasks are defined, and applicants may choose to participate in any of them: Phone segmentation and Word segmentation.
Two modalities are allowed:

  • Closed: only distributed data are allowed to train and to tune the system
    • Closed-A training on ADULT speech + adaptation on CHILDREN speech
    • Closed-B training on CHILDREN speech
  • Open: participant are allowed to use any type of data for system training, provided that they declare and describe the proposed setup in the final report.

As for Training data both adult and children speech is considered. As for adult speech, the training material is made by 15 map task dialogues recorded by couples of speakers exhibiting a wide variety of Italian varieties from the CLIPS corpus (see EVALITA 2011). Dialogues length ranges from 7/8 minutes to 15/20 minutes. It is up to participants to split these data in train and development subsets. For each dialogue, the following files are provided:

  • Full dialogue manually performed transcriptions
  • Single turn audio files: PCM-encoded mono WAV files (16kHz). Each file is referenced to turns into the full transcription by means of its name
  • Single turn phonetic labeling
  • Single turn word labeling

As for children speech, the training material is made by about 40 sentences read by 20 female and 20 male children. Sentences length ranges from 2/3 seconds to 5/6 seconds. It is up to participants to split these data in train and development subsets.
For each sentence, the following files are provided:

  • Full sentences automatic performed transcriptions
  • Audio files: PCM-encoded mono WAV files (16kHz). Each file is referenced to turns into the full transcription by means of its name
  • Phonetic labeling
  • Word labeling

The Test data is made by unpublished sentences with children’s speech.

Detailed guidelines, task materials and data sets for development, training and testing will be made available on the Task Website.

Organizers

  • Piero Cosi (ISTC-CNR, UOS Padova, Italy)
  • Francesco Cutugno (University of Naples “Federico II”, Napoli, Italy)
  • Vincenzo Galatà (ISTC-CNR, UOS Padova, Italy)
  • Antonio Origlia (University of Naples “Federico II”, Napoli, Italy)

Contact

Piero Cosi piero.cosi[at]pd.istc.cnr.it