[Dec 12th 2018] Evolutionary Coresets and Machine Learning Epistemology

Pietro Barbiero (DISMA, Politecnico di Torino)

Dec 12, 2018 – 11:00 AM
DIISM, Artificial Intelligence laboratory (room 201), Siena SI
Description

In recent years, machine learning research has proposed effective algorithms, with huge impacts on applications ranging from medicine to autonomous driving cars. Although these models are powerful and accurate, they show serious weaknesses, such as hard interpretability, and lack of abstraction or integration with deductive reasoning. In order to provide more comprehensible models, researchers often focus on feature selection and dimensionality reduction techniques. However, in the era of big data, the transpose problem (i.e. dealing with an unmanageable amount of samples) is emerging. In optimization, a coreset can be de ned as a subset of the input samples, such that a good approximation to the optimization problem can be obtained by solving it directly on the coreset, instead of using the whole original input. In machine learning, coresets are exploited for applications ranging from speeding up training time, to helping humans understand the fundamental properties of a class, by considering only a few meaningful samples. The problem of discovering coresets, starting from a dataset and an application, can be de fined as identifying the minimal amount of samples that do not lower application’s performance with respect of the performance on the whole dataset. Specialized literature offers several approaches to finding coresets, but such algorithms often disregard the application, or explicitly ask the user for the desired number of points. Starting from the consideration that fi nding coresets is an intuitively multi-objective problem, as minimizing the number of points goes against maintaining the original performance, we propose a multi-objective evolutionary approach to identifying coresets for classi cation. The proposed approach is tested on classical machine learning classi cation benchmark, using 6 state-of-the-art classfi ers, comparing against 7 algorithms for coreset discovery. Results show that not only the proposed approach is able to fi nd coresets representing different compromises between compactness and performance, but that different coresets are identifi ed for different classifi ers, reinforcing the assumption that coresets might be closely linked to the specifi c application.

 |  Category: Seminars