CH-SIMS v2.0: A Fine-grained Multi-label Chinese Multimodal Sentiment Analysis Dataset

Overview

CH-SIMS v2.0, a Fine-grained Multi-label Chinese Sentiment Analysis Dataset, is an enhanced and extended version of CH-SIMS Dataset. We re-labeled all instances in CH-SIMS to a finer granularity and the video clips as well as pre-extracted features are remade. We also extended the number of instances to a total of 14563. The new dataset contains videos collected from much wider scenarios, as shown in the banner image.

As shown in below figure, CH-SIMS v2.0 contains 4402 supervised instances, denoted as CH-SIMS v2.0 (s), and 10161 unsupervised instances, denoted as CH-SIMS v2.0 (u). The supervised instances share similar properties with CH-SIMS dataset. The unsupervised instances show a much more diverse distribution of video duration, which better simulates real-world scenarios. The textual features of the unsupervised instances are collected from the ASR transcript without manual correction and thus contain noise, which also fits real-world scenarios better.

We split train, valid and test set to a proportion of roughly 9:2:3. The regression labels range from -1 to 1. The classification labels are: Negative(NEG), Weakly Negative(WNEG), Neutral(NEU), Weakly Positive(WPOS), and Positive(POS). The label distribution is shown in below figure. The test set is speaker-unrelated with the train/valid set.

The baseline experiments are conducted via MMSA platform. The results are reported HERE on Github.

Citation

Please cite us if you find our work useful.

[1] Yihe. Liu, Ziqi Yuan, Huisheng Mao, Zhiyun Liang, Wanqiuyue Yang, Yuanzhe Qiu, Tie Cheng, Xiaoteng Li, Hua Xu, Kai Gao

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

Bibtex | PDF

  @misc{liu2022make,
    title={Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module}, 
    author={Yihe Liu and Ziqi Yuan and Huisheng Mao and Zhiyun Liang and Wanqiuyue Yang and Yuanzhe Qiu and Tie Cheng and Xiaoteng Li and Hua Xu and Kai Gao},
    year={2022},
    eprint={2209.02604},
    archivePrefix={arXiv},
    primaryClass={cs.MM}
  }

[2] Wenmeng Yu, Hua Xu, Fanyang Meng, Yilin Zhu, Yixiao Ma, Jiele Wu, Jiyun Zou, Kaicheng Yang

CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotations of Modality

Bibtex | PDF

  @inproceedings{yu2020ch,
    title={Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality},
    author={Yu, Wenmeng and Xu, Hua and Meng, Fanyang and Zhu, Yilin and Ma, Yixiao and Wu, Jiele and Zou, Jiyun and Yang, Kaicheng},
    booktitle={Proceedings of the 58th annual meeting of the association for computational linguistics},
    pages={3718--3727},
    year={2020}
  }

Dataset	Links
CH-SIMS v2.0 (s)	[Google Drive] [Baiduyun Drive]
CH-SIMS v2.0 (u)	[Google Drive] [Baiduyun Drive]
CH-SIMS	[Google Drive] [Baiduyun Drive]