Automatic Speech Recognition – Large Vocabulary Transcription

In the ASR task, systems are required to transcribe audio sequences of Italian parliament. Two subtasks are defined, and applicants may choose to participate in any of them:

  • transcription
  • constrained transcription, using the accompanying minutes

Two modalities are allowed:

  • closed: only distributed data are allowed for training and tuning the system
  • open: the participant can use any type of data for system training, declaring and describing the proposed setup in the final report

The evaluation is based on Word Accuracy, evaluated as Minimum Edit Distance between the recognizer output and the reference annotation. Training and development material extracted from wide-band (16kHz) corpora will be provided.


Training data consist in: – about 30h of parliament audio sessions along with related (automatic) transcriptions – 1-year minutes of parliament sessions – lexicon covering acoustic and partly language model data

  • Dev data: – 1 hour parliament session + minutes + reference transcription
  • Test data: – ~1 hour audio sequences from parliament sessions

Data distribution

Test data [04/10/2011] – Training data are available. Please contact: Marco Matassoni, matasso[at]

Distributed data can be used only for the Evalita context, no fee is required.

Task materials

Detailed Guidelines [22/08/2011]


  • Fabio Brugnara (FBK-irst, Trento)
  • Roberto Gretter (FBK-irst, Trento)
  • Marco Matassoni (FBK-irst, Trento)