Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

Abstract

Improving robustness against data missing has become one of the core challenges in Multimodal Sentiment Analysis (MSA), which aims to judge speaker sentiments from the language, visual, and acoustic signals. In the current research, translation-based methods and tensor regularization methods are proposed for MSA with incomplete modality features. However, both of them fail to cope with random modality feature missing in non-aligned sequences. In this paper, a transformer-based feature reconstruction network (TFR-Net) is proposed to improve the robustness of models for the random missing in non-aligned modality sequences. First, intramodal and inter-modal attention-based extractors are adopted to learn robust representations for each element in modality sequences. Then, a reconstruction module is proposed to generate the missing modality features. With the supervision of SmoothL1Loss between generated and complete sequences, TFR-Net is expected to learn semantic-level features corresponding to missing features. Extensive experiments on two public benchmark datasets show that our model achieves good results against data missing across various missing modality combinations and various missing degrees.

Publication
Proceedings of the 29th ACM International Conference on Multimedia
Ziqi Yuan
Ziqi Yuan
Ph.D Student

My research direction is multimodal machine learning.

Hua Xu
Hua Xu
Tenured Associate Professor, Associate Editor of Expert Systems with Application, Ph.D Supervisor
Wenmeng Yu
Master’s Degree

My research direction is multimodal learning, facial expression recognition, and multi-task learning.