Learning Based Saliency Model with Depth Information
Most previous studies on visual saliency focused on 2D scenes. Due to the rapidly growing 3D video applications, it is very desirable to know how the depth information affects human visual attention. In this paper, we first conduct the eye fixation experiments on 3D images. Our fixation dataset comprises 475 3D images and 16 subjects. We use a Tobii TX300 eye-tracker to track the eye movement of each subject. In addition, this database contains 475 computed depth maps. Owning to the scarcity of public-domain 3D fixation data, this dataset should be useful to the 3D visual attention research community. Then, a learning based visual attention model is designed to predict human attentions. In addition to the popular 2D features, we include the depth map and its derived features. The results indicate that the extra depth information can enhance the saliency estimation accuracy specifically for close-up objects hidden in a complex-texture background. In addition, we examine the effectiveness of various low-, mid-, and high-level features on saliency prediction. Compared with both 2D and 3D state-of-the-art saliency estimation models, our methods show better performance on the 3D test images. The eye-tracking database and the MATLAB source codes for the proposed saliency model and evaluation methods are available on our website.