In the Named Entity Recognition task, systems are required to recognize the Named Entities occurring in a text. In particular, the task will focus on the following types of Named Entities: Person (PER), Organization (ORG), Location (LOC) and Geo-Political Entities (GPE) – (see the annotation report for more details).
The evaluation will be based on the Italian Content Annotation Bank (I-CAB) Version 4.1, an annotated corpus developed in the context of the Ontotext Project. I-CAB 4.1 is annotated with Named Entities in the IOB format (where “B-begin” and “I-inside” denote the tokens belonging to Named Entities and “O-outside” is used for all other tokens). Upon accepting the agreement terms for a free licence, participants will be provided with development data (the development part of I-CAB 4.1, i.e. 335 news stories, for a total of 113,000 words). The test data on which the official evaluation will be performed consist of the test part of I-CAB 1.4 (i.e. 190 news stories, for a total of 69,000 words).
All the data we provide will also be annotated with Part of Speech information using the Elsnet tagset for Italian.
Manuela Speranza (FBK, Trento, Italy – manspera[at]fbk.eu)
The I-CAB Corpus Version 4.1 (used for EVALITA 2007) is freely available for research purposes upon acceptance of a license agreement: obtain I-CAB 4.1. Contact: Manuela Speranza (manspera[at]fbk.eu)
input sample and output sample
Download the CONLL 2002 Scorer from the CONLL website