Human and Machine Dialect Identification from Natural Speech and Artificial Stimuli (HMDI)

Since its origins, the challenge of Automatic Language Identification (LID) encountered the problems raised by the presence of dialectal variation and the difficult task of accent identification (AID).

The aim of the present task is to test human and machine performances by offering:

a set of 18 sound samples (max 10 s) uttered by a set of female speakers of different world’s languages (18 languages/speakers for testing + 18 longer samples for each languages + 8 variants by other speakers);
a set of 20 speech samples in different dialects: 10 speech files of dialects of Italy and 10 speech files (similar sentences in 5 foreign languages represented by two different speakers);
a set of 20 synthetic stimuli (generated from selected prosodic information – temporal and pitch cues) based on speech samples used in the other test-sets.

Detailed guidelines, task materials and data sets for development, training and testing will be made available on the Task Website.

Organizers

Antonio Romano (University of Turin)
Claudio Russo (University of Turin)

Contact

Antonio Romano antonio.romano[at]unito.it