Valerio Basile (University of Turin)
Feb 12, 2020 – 11:40 AM
DIISM, Artificial Intelligence laboratory (room 201), Siena SI
Today’s Web represents a huge repository of human knowledge, not only about facts, people, places and so on (encyclopedic knowledge), but also about everyday beliefs that average human beings are expected to hold (commonsense knowledge). Automated agents such as domestic robots and virtual assistants need to be equipped with this kind of knowledge in order to be autonomous in their functions. However, the majority of the commonsense knowledge on the Web is present in the form of natural language, rather than structured formats ready to be processed by machines.
Semantic Parsing and Word Sense Disambiguation are two well-studied tasks in NLP that aim at extracting the structure and lexical semantics from natural language, respectively. During my postdoc at Inria Sophia Antipolis on the EU project ALOOF (Autonomous Learning of the Meaning of Objects ), I worked on combining the two tasks in order to “read” a large quantity of text on the Web and collect many instances of structured grounded knowledge, under the common framework of Frame Semantics. After creating a corpus and parsing it with the pipeline I developed KNEWS (Knowledge Extraction With Semantics ), we used clustering techniques to filter out the noise and distill the most prototypical knowledge about common concepts, particularly objects, locations and actions. The final result is a Linked Data language-neutral dataset, subset of the commonsense knowledge base DeKO (Default Knowledge about Objects ).