In the Parsing task, systems must syntactically annotate all sentences in a sample. Participants to the parsing task will be provided with a development corpus of about 2.000 sentences (about 58.000 tokens) extracted from the Turin University Treebank (TUT) annotated in both a dependency-based (TUT native) and a constituency-based (Penn-like) format.
Therefore the task consists of two distinct subtasks, i.e. dependency parsing and constituency parsing, and participants can select one or perform both.
Evaluation of results will adhere to standard measures such as Labelled Attachment Score (LAS) for dependency parsing (see e.g. CoNLL-X) and EVALB for constituency parsing.
Furthermore, in order to better represent the current scenario of parsing for the Italian language, on the same (PoS tagged) test set different parsed outputs will be accepted as results and adequately evaluated. Each result will be evaluated taking into account separately various features of the annotation that are non-standardized yet, e.g. null elements, punctuation markers, amalgams, non-projective structures. PLEASE NOTICE: The deadline for the distribution of results has been postponed to June 25th.
- Cristina Bosco (Dipartimento di Informatica, Università di Torino – bosco[at]di.unito.it)
- Alessandro Mazzei (Dipartimento di Informatica, Università di Torino – mazzei[at]di.unito.it)
- Vincenzo Lombardo (Dipartimento di Informatica, Università di Torino – vincenzo[at]di.unito.it)
- Fabio Massimo Zanzotto (Università di Roma Tor Vergata) – zanzotto[at]info.uniroma2.it
Development data consist of about 2.000 sentences annotated in both TUT and TUT-Penn formats. Data are freely available and no fee will be required. Data can be downloaded from the TUT web site (see at the section EVALITA).