Emotion Recognition Task (ERT)

The EVALITA Emotion Recognition task is the first evaluation campaign for Italian Emotional speech. In this task, we aim at evaluating the performance of automatic emotion recognition systems and to investigate two main topics, covered by two different subtasks:

  • cross language, open database task
  • Italian only, closed database task

First of all, we would like to estimate the performance that can be obtained on Italian using emotional speech corpora in other languages. We would also like to verify to what extent it is possible to build a model for emotional speech starting from a single, professional, speaker portraying the discrete set of emotions dened by Ekman(1992)(anger, disgust, fear, joy, sadness, surprise, and neutral).

In this first evaluation of emotional speech recognition systems on Italian, the material we use is composed of acted speech elicited by means of a narrative task. The material is extracted from two emotional speech corpora containing similar material and sharing basic characteristics:

  • the E-Carini corpus
  • the €motion corpus

For both tasks, the test set will contain material from both corpora mentioned above. Concerning the second subtask, the goal of the evaluation will be to establish how much information can be extracted from material coming from a single, professional source of information whose explicit task is to portray emotions and obtain models capable of generalizing to unseen subjects.

The objective measure for evaluation will be the F-measure. In order to stimulate the discussion during the workshop, the originality of the proposed approaches will also be evaluated by a scientic committee composed by the organizers and by at least other two natural language processing experts.

Participants will be provided with a development set taken from the €motion corpus to obtain reference results for the test material during the system preparation time.

The training set provided for the closed database task consists of 1 hour and 17 minutes of speech recorded by a professional actor reading a story using the 6+1 basic emotions. The material consists of PCM encoded WAV files (16000Hz). Details on the development and test sets will be published on the task website along with the guidelines.

Detailed guidelines, task materials and data sets for development, training and testing will be made available on the Task Website.


  • Vincenzo Galatà (ISTC-CNR, UOS Padova, Italy)
  • Antonio Origlia (University of Naples “Federico II”, Napoli, Italy)


Antonio Origlia antori[at]gmail.com