Research in natural language processing (NLP) has seen striking advances in recent years, mainly driven by large pretrained language models. However, most of these successes have been achieved in English and a small set of other high-resource languages. In this talk, I will highlight methods that enable us to scale NLP models to more of the world's 7,000 languages, challenges, and promising future directions.
- Multilingual models are necessary in order to democratise access to language technology.
- Large language models trained on unlabelled data in many languages learn a surprising amount of cross-lingual information.
- Nevertheless, there are many open challenges such as generalisation to languages with limited data.
Sebastian Ruder is a research scientist in the Language team at DeepMind, London. He completed his PhD in Natural Language Processing and Deep Learning at the Insight Research Centre for Data Analytics, while working as a research scientist at Dublin-based text analytics startup AYLIEN. Previously, he studied Computational Linguistics at the University of Heidelberg, Germany and at Trinity College, Dublin.