Matteo Tiezzi (DIISM, University of Siena)
Sept 6, 2018 – 9:30 AM
DIISM, Artificial Intelligence laboratory (room 201), Siena SI
Adversarial examples are defined as “inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake”.
Indeed, in the computer vision scenario it has been shown that well-crafted perturbation to input images can induce classification errors, such as confusing a cat with a computer.
In general, adversarial attacks are designed to degrade the performances of models or to prompt the prediction of specific output classes.
In this recent work, the authors introduce a framework in which the goal of adversarial attacks is to reprogram the target model to perform a completely new task.
This is accomplished by optimizing for a single adversarial perturbation that can be added to all test-time inputs.
In such a way, the target model performs a task chosen by the adversary when processing these inputs—even if it was not trained on this task.