In the Entity Recognition task, systems are required to recognize the Entities occurring in a text. The task consists of two subtasks as the Annotation of Person Attributes subtask, that had initially been announced, has been canceled. Participants may choose to partecipate in any of the two subtasks.
Data Distribution I-CAB is freely available for research purposes upon acceptance of a license agreement.
Named Entity Recognition (NER)
In the Named Entity Recognition subtask, systems are required to recognize only the Named Entities occurring in a text; more specifically Person, Organization, Location and Geo-Political Entities (see the annotation report for more details). As in the previous edition of EVALITA, the evaluation will be based on the Italian Content Annotation Bank (I-CAB) where Named Entities are annotated in the IOB format (where “B-begin” and “I-inside” denote the tokens belonging to Named Entities and “O-outside” is used for all other tokens).
The dataset that has been used for the NER task at EVALITA 2007 (525 news stories), will be distributed as development set, while the testset will consist of completely new data.
All the data will also be annotated with Part of Speech information using the Elsnet tagset for Italian.
Task materials – Detailed Guidelines – Annotation Report – Trial examples: input sample and output sample – Download the CONLL 2002 Scorer from the CONLL website
Local Entity Detection and Recognition (LEDR)
The Local Entity Detection and Recognition (LEDR) task requires that entities (i.e. persons, organizations, geo-political entities and geographical locations) mentioned in source texts be detected, and that selected information about these entities be recognized. The information comprises the attributes and the mentions of entities within each document, following the ACE-LDC standards with all the modifications needed to adapt them to the specific morphosyntactic features of Italian.
In this task, each document is processed separately and entities that are mentioned in different documents are treated as different entities.
In the nomenclature we use, an entity provides a representation of an object in the world, while an entity mention provides information about any textual references to that object. For instance, if “Elvis Presley” is mentioned in two different sentences of a text as “il cantante/the singer” and as “egli/he“, these two expressions are considered as two co-referring entity mentions (i.e. two mentions of the same entity).
The evaluation will be based on the Italian Content Annotation Bank (I-CAB).
Detailed Guidelines [27/03/2009] – Annotation Report [09/07/2009] – Trial examples [27/03/2009] – ACE-2008 scorer [09/07/2009] – APF DTD [14/07/2009]
- Valentina Bartalesi Lenzi (CELCT, Trento)
- Manuela Speranza (FBK, Trento)
- Rachele Sprugnoli (CELCT, Trento)