The Parsing Task is among the “historical” tasks of EVALITA. Organized on the basis of progressively larger datasets extracted from different CoNLL-compliant releases of two dependency treebanks developed for the Italian language, i.e. the Turin University Treebank (TUT) and the ISST-TANL Treebank, it has been always among the most participated tasks in all editions of the contest (from 2007 onwards). Results achieved in these shared tasks define the state of the art as far as Italian dependency parsing is concerned (LAS 88.76 and UAS 93.55), which turned out to be quite close to English.
Thanks to the availability of the newly developed “Italian Stanford Dependency Treebank” (ISDT) it is now possible to organize a new edition of the dependency parsing task with three main novelties: the size of the dataset, which is much bigger than the resources used in the previous EVALITA campaigns; the annotation scheme, which is compliant to de facto standards at the level of both representation format (CoNLL) and adopted tagset (Stanford Dependency Scheme); its being defined with a specific view to supporting information extraction tasks, a feature inherited from the Stanford Dependency scheme. These ISDT novelties, combined together, significantly improve and extend the usability of the resource, by increasing the number of tasks for which it can be exploited and allowing the application of a larger variety of tools.
We expect that a Dependency Parsing Task organized around ISDT can raise the attention of both the national and international dependency parsing communities, attracting a wider group of researchers sharing the interest for the Stanford Dependency framework.
The Parsing Task being proposed will be organized into two subtasks:
- a basic task focusing on standard dependency parsing of Italian texts, with a double evaluation track aimed at testing the performance of parsing systems as well as their suitability to Information Extraction tasks;
- a pilot task focusing on cross-lingual transfer parsing, where as suggested by McDonald et al. (2013) a parser trained on the “Italian Stanford Dependency Treebank” (universal version) is used on test sets of other (not necessarily typologically related) languages.
Detailed guidelines, task materials and data sets for development, training and testing will be made available on the Task Website
Cristina Bosco (University of Torino)
Felice Dell’Orletta (Istituto di Linguistica Computazionale “Antonio Zampolli” – CNR, Pisa)
Simonetta Montemagni (Istituto di Linguistica Computazionale “Antonio Zampolli” – CNR, Pisa)
Manuela Sanguinetti (University of Torino)
Maria Simi (University of Pisa)
Roberta Montefusco (University of Pisa)