Multi-Scale Attention-Enhanced Vision Transformer for Automated Tumor Segmentation in Medical Imaging
Main Article Content
Abstract
Accurate tumor segmentation is a critical step in computer-aided diagnosis, treatment planning, and disease progression monitoring. However, medical images often contain tumors with irregular shapes, heterogeneous textures, and blurred boundaries, making precise segmentation challenging for conventional convolutional neural networks. This paper proposes a multi-scale attention-enhanced Vision Transformer framework for automated tumor segmentation in medical imaging. The proposed model integrates local convolutional feature extraction with global self-attention modeling to capture both fine-grained anatomical details and long-range contextual dependencies. A multi-scale feature fusion module is designed to improve the representation of tumors with varying sizes, while an attention-guided boundary refinement mechanism is introduced to enhance segmentation accuracy around ambiguous lesion margins. Experimental evaluation on public medical imaging datasets demonstrates that the proposed method improves Dice coefficient, Intersection over Union, and boundary accuracy compared with baseline CNN-based and transformer-based models. The results suggest that combining multi-scale representation learning with attention-based global modeling can provide a more reliable and clinically useful approach for tumor segmentation in complex medical imaging scenarios.