Abstract
Saliency and visual attention have been studied in a computational context for decades, mostly in the capacity of predicting spatial topographical saliency maps or simulated heatmaps. Spatial selection by an attentive mechanism is, however, inherently a sequential sampling process in humans. There have been recent efforts in analyzing and modeling scanpaths, however, there is as of yet no universal agreement on what metrics should be applied to measure scanpath similarity or the quality of a predicted scanpath from a computational model. Many similarity measures have been suggested in different contexts and little is known about their behavior or properties. This paper presents in one place a review of these metrics, axiomatic analysis of gaze metrics for scanpaths, and careful analysis of the discriminative power of different metrics in order to provide a roadmap for further future analysis. This is accompanied by experimentation based on classic modeling strategies for simulating sequential selection from traditional representations of saliency, and deep neural networks that produce sequences by construction. Experiments provide strong support for the necessity of sequential analysis of attention and support for certain metrics including a family of metrics introduced in this paper motivated by the notion of scanpath plausibility.