Computer Vision

Learning in Visual Environments. We aim at developing intelligent agents with visual skills that operate in a given environment. A continuous stream of data (video signal) is presented to the agent and the agent is expected to learn from the processed information, progressively developing his skills in making predictions over the “pixels” of the observed data stream. The principle of Least Cognitive Action, that parallels the laws of mechanics, is exploited to devise the life-long online learning laws that drive the behaviour of the agent. Motion invariance allows the agent to develop robust features, that is further extended with the idea of invariance to some categories of eye movements, where the notion of focus of attention is introduced to reduce the information overflow that is typical of commonly observed scenes.

Developmental Visual Agents (DVA) are intelligent agents aimed at learning to see like children. They are thought of as general-purpose systems capable of continuously processing video information, according to the never-ending learning philosophy, and they are designed for interacting with users, which provide them supervisions on the objects in the scene. DVA are organized in a hierarchical architecture, which first extracts scale- and rotation-invariant features at the bottom levels, and then processes such features in order to identify regions and recognize objects according to the framework of Support Constraint Machines.

Eye Movement Laws (EYMOL) is a differential model of attentional scanpath. We devise variational laws of the eye-movement that rely on a generalized view of the Least Action Principle in physics. The potential energy captures details as well as peripheral visual features, while the kinetic energy corresponds with the classic interpretation in analytic mechanics. In addition, the Lagrangian contains a brightness invariance term, which characterizes significantly the scanpath trajectories. We obtain differential equations of visual attention as the stationary point of the generalized action. Model is evaluated in tasks of saliency prediction and scanpath prediction.