Parsing

The new deadline for the submission of results is strictly Friday, 21th October 2011.

The parsing task of EVALITA aims to define and extend the current state of the art in parsing of Italian by encouraging the application of existing models to this language. As in Evalita 2007 and 2009, both statistical and rule-based approaches are allowed, and, in order to account for different parsing paradigms, the task will be articulated into two different tracks, i.e. dependency and constituency parsing.

Dependency parsing track The dependency parsing track will use as a development set the Turin University Treebank (TUT) developed by the University of Torino. The part of the resource used in the previous Evalita campaigns has been recently updated and reorganized in corpora according to text genre, and increased by new legal texts a new corpus from Wikipedia. All the data will be made available both in native TUT and CoNLL format. Comparisons of results achieved on different corpora will result in interesting insights into whether and how the different text genres can influence the parser performance.

Constituency parsing track The constituency parsing track will be based on the same corpora of the Turin University Treebank used in the Dependency parsing track, but annotated in a Penn-like format (TUT-Penn), developed by the University of Torino. By showing a higher distance from the state of the art for constituency than for dependency parsing for Italian, the results of the Evalita 2007 and 2009 confirmed the hypothesis known in literature that dependency structures are more adequate for the representation of Italian, regardless of the fact that the task is based on the Penn treebank format, which is the more diffused and parsed in the world. The constituency parsing track will provide the community members with the possibility of testing again their parsers across text genres (as in dependency track) and paradigms, since this track exploits the same linguistic data of the dependency parsing track annotated according to a constituency-based format. The quantitative evaluation of the different kinds of results will be performed according to the standard measures, i.e. LAS (labelled attachment score) for the dependency track and EvalB (crossing bracket measure) for the constituency track.

Task materials

Detailed Guidelines [13/07/2011]

Data Distribution

All TUT data are covered by a Creative Commons license “Attribution- Non-Commercial-Share Alike 2.5 Italy”. Test and training data (for Dependency Parsing and for Constituency Parsing Track) can be downloaded from the dedicated web page.

For any problem please contact: Cristina Bosco, bosco[at]di.unito.it

Organizers

Cristina Bosco and Alessandro Mazzei (Dipartimento di Informatica, Università di Torino)