Pietro Barbiero (DISMA, Politecnico di Torino)
Dec 12, 2018 – 11:00 AM
DIISM, Artificial Intelligence laboratory (room 201), Siena SI
In recent years, machine learning research has proposed effective algorithms, with huge impacts on applications ranging from medicine to autonomous driving cars. Although these models are powerful and accurate, they show serious weaknesses, such as hard interpretability, and lack of abstraction or integration with deductive reasoning. In order to provide more comprehensible models, researchers often focus on feature selection and dimensionality reduction techniques. However, in the era of big data, the transpose problem (i.e. dealing with an unmanageable amount of samples) is emerging. In optimization, a coreset can be dened as a subset of the input samples, such that a good approximation to the optimization problem can be obtained by solving it directly on the coreset, instead of using the whole original input. In machine learning, coresets are exploited for applications ranging from speeding up training time, to helping humans understand the fundamental properties of a class, by considering only a few meaningful samples. The problem of discovering coresets, starting from a dataset and an application, can be defined as identifying the minimal amount of samples that do not lower application’s performance with respect of the performance on the whole dataset. Specialized literature offers several approaches to finding coresets, but such algorithms often disregard the application, or explicitly ask the user for the desired number of points. Starting from the consideration that finding coresets is an intuitively multi-objective problem, as minimizing the number of points goes against maintaining the original performance, we propose a multi-objective evolutionary approach to identifying coresets for classication. The proposed approach is tested on classical machine learning classication benchmark, using 6 state-of-the-art classfiers, comparing against 7 algorithms for coreset discovery. Results show that not only the proposed approach is able to find coresets representing different compromises between compactness and performance, but that different coresets are identified for different classifiers, reinforcing the assumption that coresets might be closely linked to the specific application.