Dual-Channel Attention-Based Multimodal Sentiment Analysis Model Integrating Text and Image Features

Main Article Content

Tamsin Reuel

Abstract

With the rise of multimodal data in social media, sentiment analysis based solely on text has become insufficient to capture the richness of human emotion. To address this limitation, this paper proposes a dual-channel multimodal sentiment analysis model based on attention mechanisms, named ACMSA (Attention Channel Multimodal Sentiment Analysis). The model integrates textual and visual features to improve emotional understanding and classification accuracy. Text features are extracted using the BERT model and processed through a CNN-BiGRU-Attention dual-channel architecture to capture both local and global semantic dependencies. Image features are obtained via ResNet152, enhanced by a Channel-Spatial Attention Module (CSAM) that adaptively emphasizes salient regions. The fusion of multimodal features is achieved through a Co-Attention mechanism, enabling fine-grained interaction between textual and visual representations. Experimental evaluations on the MVSA-Single and MVSA-Multi Twitter datasets demonstrate that ACMSA outperforms state-of-the-art baselines, achieving an accuracy of 77.08% and 74.42%, respectively. The results verify that attention-guided dual-channel modeling effectively strengthens cross-modal correlation and interpretability. This framework provides a robust and extensible solution for sentiment analysis in multimedia-rich environments, offering valuable implications for emotion recognition, social media monitoring, and intelligent interaction systems.

Article Details

Section

Articles

References

[1]A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin, "Attention is All You Need," Advances in Neural Information Processing Systems, vol. 30, 2017.

[2]Y. H. H. Tsai, S. Bai, P. P. Liang, J. Z. Kolter, L. P. Morency and R. Salakhutdinov, "Multimodal Transformer for Unaligned Multimodal Language Sequences," Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6558-6569, 2019.

[3]W. Rahman, M. K. Hasan, S. Lee, A. B. Zadeh, C. Mao, L. P. Morency and E. Hoque, "Integrating Multimodal Information in Large Pretrained Transformers," Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2359-2369, 2020.

[4]S. Woo, J. Park, J. Y. Lee and I. S. Kweon, "CBAM: Convolutional Block Attention Module," Proceedings of the European Conference on Computer Vision (ECCV), pp. 3-19, 2018.

[5]Z. Zhu, Y. Yan, R. Xu, Y. Zi and J. Wang, "Attention-Unet: A Deep Learning Approach for Fast and Accurate Segmentation in Medical Imaging," Journal of Computer Science and Software Applications, vol. 2, no. 4, pp. 24-31, 2022.

[6]N. Xu, W. Mao and G. Chen, "A Co-Memory Network for Multimodal Sentiment Analysis," Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 929-932, 2018.

[7]X. Yang, S. Feng, D. Wang and Y. Zhang, "Image-Text Multimodal Emotion Classification via Multi-View Attentional Network," IEEE Transactions on Multimedia, vol. 23, pp. 4014-4026, 2020.

[8]T. Jiang, J. Wang, Z. Liu and Y. Ling, "Fusion-Extraction Network for Multimodal Sentiment Analysis," Advances in Knowledge Discovery and Data Mining (PAKDD), pp. 785-797, 2020.

[9]T. Zhu, L. Li, J. Yang, S. Zhao, H. Liu and J. Qian, "Multimodal Sentiment Analysis with Image-Text Interaction Network," IEEE Transactions on Multimedia, vol. 25, pp. 3375-3385, 2022.

[10]A. B. Zadeh, P. P. Liang, S. Poria, E. Cambria and L. P. Morency, "Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph," Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 2236-2246, 2018.

[11]G. Cai and B. Xia, "Convolutional Neural Networks for Multimedia Sentiment Analysis," CCF International Conference on Natural Language Processing and Chinese Computing, pp. 159-167, 2015.

[12]Y. Yu, H. Lin, J. Meng and Z. Zhao, "Visual and Textual Sentiment Analysis of a Microblog Using Deep Convolutional Neural Networks," Algorithms, vol. 9, no. 2, p. 41, 2016.

[13]H. Liu, "Structural Regularization and Bias Mitigation in Low-Rank Fine-Tuning of LLMs," Transactions on Computational and Scientific Methods, vol. 3, no. 2, 2023.

[14]Z. Qiu, "A Multi-Scale Deep Learning and Uncertainty Estimation Framework for Comprehensive Anomaly Detection in Cloud Environments," Transactions on Computational and Scientific Methods, vol. 3, no. 2, 2023.

[15]B. Barlocker and X. Yan, "Contrastive Representation Learning for Anomaly Detection in Cloud-Based Backend Services," Artificial Intelligence and Computing Innovations, vol. 1, no. 2, 2021.

[16]Y. Xing, "Enhancing Advertising Recommendation Performance via Integrated Causal Inference and Exposure Bias Correction," Journal of Computer Technology and Software, vol. 2, no. 3, 2023.

[17]M. Wang, "Multi-Level Attention and Sequence Modeling for Dynamic User Interest Representation in Real-Time Advertising Recommendation," Transactions on Computational and Scientific Methods, vol. 3, no. 2, 2023.

[18]S. Pan, T. Hu, S. Sun, J. Yuan and J. Luo, "Help Oneself in Helping the Others: The Ecology of Online Support Groups," Proceedings of the IEEE International Conference on Big Data, pp. 2418-2427, 2019.

[19]A. M. Jones et al., "USC-DCT: A Collection of Diverse Classification Tasks," Data, vol. 8, no. 10, p. 153, 2023.