How to Efficiently Achieve Cross-Modal Biometric Matching?

Data:12-07-2021  |  【 A  A  A 】  |  【Print】 【Close

It is demonstrated by neuroscientists that humans can be identified by their faces or voices because of the unique identity contained in faces and voices, which has great potential usefulness in security and surveillance systems. For example, a masked gangster can be recognized according to voice samples, without requiring faces.

Different from the unimodal recognition task, the cross-modal biometric matching (CMBM) tries to bridge the gap between human faces and voices. However, a great number of variations make bridging the gap between human faces and voices extremely challengeable. Are there any method can efficiently realize CMBM.

A research team led by Dr. ZHENG XiangTao from Xi'an Institute of Optics and Precision Mechanics (XIOPM) of the Chinese Academy of Sciences (CAS) proposed a disentangled representation learning for CMBM. The results were published in IEEE TRANSACTIONS ON MULTIMEDIA.

The framework of the proposed method. (Image by XIOPM)

According to the researchers, they propose to disentangle alignable latent identity factors and nonalignable modality-dependent factors based on the variational auto encoder (VAE) framework for CMBM, instead of conditioning the two modal features on a single shared latent space. Because the VAE is capable of partitioning the set of explanatory factors in a latent space.

The proposed method consists of two main steps: 1) feature extraction and 2) disentangled representation learning. Firstly, an image feature extraction network is adopted to obtain face features, and a voice feature extraction network is applied to learn voice features. Secondly, a disentangled latent variable is explored to disentangle the latent identity factors that are shared across the modalities from the modality-dependent factors.

The results indicate that the disentangled latent identity factors for cross-modal verification, 1:N matching and retrieval between faces to voices are in state-of-the-art standard and better than previous investigations.

The proposed disentangled representation learning for CMBM has feasible prospects in security and surveillance systems.

 

Contact:
SHE Jiangbo
Xi'an Institute of Optics and Precision Mechanics
E-mail: shejb@opt.ac.cn
Date of Publication: 2021-04-12