Image Classification via Joint Multi-Scale Convolution and Global Attention Modeling
Main Article Content
Abstract
This study focuses on the task of image classification and proposes a fusion method based on multi-scale convolution and global attention to address the limitations of traditional approaches in local feature extraction and global dependency modeling. The method first employs a multi-scale convolution module to extract image features under different receptive fields, enabling the capture of both fine-grained details and macro-structural information. Global attention is then introduced to dynamically assign weights at the feature level, strengthening dependencies across regions and ensuring effective global semantic understanding. Through the joint design of local and global modeling, the method achieves more comprehensive feature representation in complex image scenarios. In the feature fusion and aggregation stage, the results of multi-scale convolution and global attention are effectively combined, followed by classification outputs. Experiments conducted on public datasets with comparative validation and sensitivity analysis show that the proposed method outperforms common baselines such as MLP, CNN, LSTM, and Transformer in terms of AUC, ACC, Precision, and Recall, demonstrating the advantage of combining multi-scale and global attention in improving classification performance. Further hyperparameter sensitivity experiments indicate that factors such as the number of attention heads, batch size, and noise level influence model performance, highlighting the importance of proper configuration for enhanced stability. Overall, the method exhibits strong accuracy and robustness, validating the effectiveness of the fusion of multi-scale convolution and global attention.