Lexical Substitution

Word Sense Disambiguation (WSD) is a fundamental step in the pursuit of Natural Language Understanding. Due to its important role, WSD has been present in relevant evaluation contests in recent years (e.g. Senseval, Semeval, Evalita). The evaluation of WSD has typically consisted of disambiguating the correct sense of words according to the senses present in computational lexicons (especially WordNet (Fellbaum 1998)). The main problem that arose is that the granularity of such resources is too detailed; while these fine distinctions might be useful for human users, they are not necessary for many computer applications (Ide and Wilks 2006).

An alternative way to evaluate WSD consists of performing Lexical Substitution (McCarthy 2002, McCarthy and Navigli 2006). In this case, given a word in a specific context, the participant is asked to provide the synonyms which best fit in that context. An important aspect of Lexical Substitution is the absence of a predefined sense inventory, thus allowing the participation of unsupervised approaches.

The task will consist of finding synonyms for a set of words appearing in different contexts. System performance will be evaluated by comparing the synonyms they propose against those proposed by human annotators according to different measures. Participants do not necessarily have to rely on computational lexicons but if they want to they can obtain ItalWordNet or PAROLE-SIMPLE-CLIPS from ELDA.

References

  • Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. MIT Press.
  • Ide, N. and Wilks, Y. (2006). Making Sense About Sense. In Agirre, E., Edmonds, P. (Eds.), Word Sense Disambiguation: Algorithms and Applications, Springer.
  • McCarthy, D. (2002). Lexical Substitution as a Task for WSD Evaluation. Proceedings of the SIGLEX/SENSEVAL Workshop on Word Sense Disambiguation: Recent Successes and Future Directions. Association for Computational Linguistics.
  • McCarthy, D. and Navigli, R. (2007). SemEval-2007 Task 10: English Lexical Substitution Task. Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007). Association for Computational Linguistics.

Task materials

Trial data and Test data can be freely downloaded.
Participants can obtain optionally “ItalWordNet for EVALITA” and “PAROLE-SIMPLE-CLIPS PISA Italian Lexicon for EVALITA” from ELDA (Evaluations and Language resources Distribution Agency) by contacting Ms Valerie Mapelli at mapelli[at]elda.org, who will inform you on the licensing and delivery procedure.

Organizer

Antonio Toral (ILC-CNR, Pisa)