Sep 23, 2020 – 11:45 – 12:30 AM
The text-mining field is currently in the eye of a technological storm: dozens of novel (and effective!) algorithms and architectures have been recently released, mainly by global AI players, such as Google, OpenAI, Microsoft. From a methodological point of view we observe that researchers and global companies are typically investigating ways to improve NLU capabilities by increasing the size of network
architectures, adding more computational power and investing more time and money on unsupervised pre-training and supervised fine-tuning. While this can be seen as a fascinating line of investigation, we have been experimenting a completely different approach: our goal is to leverage on the semantic information extracted by an NLP engine (Cogito in our case) to train lightweight models that (un)surprisingly can challenge the accuracy of deep network architectures. We will present our experiments on the text categorization task, applied to two different domains: news articles and insurance contracts.