Cross-document Coreference Resolution of Named Person Entities (NePS)

The News People Search (NePS) task aims at evaluating cross-document coreference resolution of named person entities in Italian news. Cross-document coreference of a person entity occurs when the same person is mentioned in more than one text source. It can be defined as a clustering problem, which in principle requires the clustering of name occurrences in a corpus according to the persons they refer to. In this task, we consider clusters of documents containing the name occurrences. Cross-document coreference involves two problematic aspects, namely (i) to resolve ambiguities between people having the same name (i.e. when identical mentions refer to distinct persons) and, conversely, (ii) to recognize when different names refer to the same person. The NePS task consists of clustering a set of Italian newspaper articles that mention a person name according to the different people sharing the name (i.e. one cluster of documents for each different person). More specifically, for each ambiguous person name, systems receive in input a set of newspaper articles and the expected output is a clustering of the documents, where each cluster is supposed to contain all and only those documents that refer to the same individual. The NePS task is limited to documents in which the entities are mentioned by name and takes into account name variability. Different kinds of name variants are considered, such as complete names (Paolo Rossi, Rossi Paolo), abbreviations (P. Rossi, Paolo R.), first names only (Paolo), last names only (Rossi), nicknames (Pablito), and misspellings (Paalo Rossi). The NePS task has close links with Word Sense Disambiguation, which is generally formulated as the task of deciding which sense a word has in a given context. In both cases, the problem addressed is the resolution of the ambiguity in a natural language expression. More precisely, the NePS task can be viewed as a case of Word Sense Discrimination, because the number of “senses” (actual people) is unknown a priori. The NePS task is structured along the same lines as the Web People Search evaluation exercise (WePS), which in 2010 was at its third edition. The main differences with respect to NePS are that the WePS task addresses the English language, does not take into account name variability, and uses a corpus of Web Pages instead of newspaper articles. NePS evaluation will be carried out using WePS-2 official scorer and metrics (Bcubed precision and recall, combined with Van Rijsbergen’s F measure).

References

WePS-3 website: http://nlp.uned.es/weps/weps-3

Task materials

Data Distribution

  • Test data sent to participants [04/10/2011]
  • The CRIPCO corpus is freely available for research purposes upon acceptance of a license agreement.

For further information please contact: Alessandro Marchetti, amarchetti[at]celct.it

Organizers

  • Luisa Bentivogli (FBK-irst, CELCT, Trento)
  • Alessandro Marchetti (CELCT, Trento)
  • Emanuele Pianta (CELCT, Trento)