Andrea Zugarini (University of Florence)
Nov 4, 2020 – 11:00 – 11:45 AM
Conference Meeting
Description
In the last decade most of the advances in NLP were achieved by learning textual representations with unsupervised Language Modeling-related tasks on large corpora.
Recently, this principle was pushed even further with transformers, where the scales of datasets and model sizes have grown by orders of magnitude wrt previous sota sequence-to-sequence recurrent models. Since their introduction, transformers have wiped out the previous state-of-the-art in almost any NLP problem.
In this seminar I present how these models work, what are the differences wrt to RNNs in terms of computational complexity and scalability issues, and finally their limitations.
Main References
Vaswani et al. Attention is all you need.
Devlin et al. Bert: Pre-training of deep bidirectional transformers for language understanding.
Radford et al. Language models are unsupervised multitask learners.
Kaplan et al. Scaling laws for neural language models.
An Introduction to NLP and HuggingFace