Aline Villavicencio is a Professor at the Department of Computer Science, University of Sheffield (UK) and a Reader at the Institute of Informatics, Federal University of Rio Grande do Sul (Brazil). She is a member of the Neurocomputational and Natural Language Processing Laboratory at UFRGS and of the Natural Language Processing Group at Sheffield. Her research interests focus on lexical semantics, multilinguality, and cognitively motivated NLP. Her work includes techniques for Multiword Expression treatment using statistical methods and distributional semantic models, and applications like Text Simplification and Question Answering, for languages like English and Portuguese.
Multiword Expressions Under the Microscope
Ranging from idioms (make ends meet), light verb constructions (take a shower) and verb particle constructions (shake up) to noun compounds (loan shark), Multiword Expressions (MWEs) have provided new challenges and opportunities for natural language processing. Their integration in tasks and applications like parsing, information retrieval, machine translation has brought improvements for language technology, providing a degree of precision, naturalness and fluency. In this talk I will present an overview of advances in the identification of MWEs, that often capitalize on the various degrees of idiosyncrasy they display, including lexical, syntactic, semantic and statistical. I will concentrate on techniques for identifying their degree of idiomaticity and approximating their meaning, as their interpretation often needs more knowledge than can be gathered from their individual components and their combinations to differentiate combinations whose meaning can be (partly) inferred from their parts (as apple juice: juice made of apples) from those that cannot (as dark horse: an unknown candidate who unexpectedly succeeds).