Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences

Salient Object Detection Makes the Computer Vision Smarter

Data：17-12-2020 | 【 A A A 】 | 【Print】【Close】

Salient object detection aims at simulating the visual characteristics of human beings and extracts the most attractive regions from images or videos. The content in these saliency areas is what we call salient objects.

Recently, deep learning methods based on Convolutional Neural Network (CNN) have successfully broken through the limitations of conventional methods due to their powerful feature extraction capabilities. They have been widely used in the field of computer vision and have even been successfully used for salient object detection.

A research team leading by Prof. DONG Yongsheng from Xi'an Institute of Optics and Precision Mechanics (XIOPM) of the Chinese Academy of Sciences (CAS) proposed a novel edge information-guided hierarchical feature fusion network to accomplish accurate salient object detection.

The proposed method employs the deep learning method to set up the salient detection strategy. First, the low-level edge information is utilized to guide saliency map generation. Then, a one-to-one hierarchical supervision strategy is adopted to generate high-level semantic information and low-level edge information. Finally, the hierarchical feature information is fused to accomplish the accurate salient object detection. The results were published in IEEE TRANSACTIONS ON IMAGE PROCESSING.

Fig.1 Visual comparisons of ablation studies between the state-of-the-art methods. (Image by XIOPM)

The visual comparisons of ablation studies are shown in fig.1. VGG means only use the VGG backbone, Traditional Fusion means use the commonly used VGG multi-layer feature fusion method, Hierarchical Fusion means proposed VGG network layered fusion method without edge supervision, our proposed HFFNet means add Laplacian edge constraints to low-level information based on proposed hierarchical feature fusion.

Since saliency detection is a relatively basic task that can increase computational efficiency, it has played an important role in many fields of computer vision, such as foreground extraction, visual tracking, scene classification, semantic segmentation, video summarization, and image retrieval. Over the years, researchers pay more attention to this researching field due to the explosion of artificial intelligence.