Abstract: Convolutional neural networks have progressed significantly in the field of biomedical image segmentation, although precision remains a challenge. The inconsistent sizes and shapes of the lesion regions make it difficult for the existing deep learning methods to extract their discriminatory features. Additionally, spatial and semantic information is not effectively merged during decoding, resulting in redundant information and semantic gaps. To address these challenges, we propose the Dense Channel Spatial Semantic Guidance Attention UNet (DCSSGA-UNet) architecture, which integrates DenseNet201 as the base encoder and attention mechanisms to enhance segmentation performance. The decoder follows the standard U-Net pipeline, with the encoder capturing global multi-scale features through dense convolutional and transition blocks, which enhance the model’s ability to distinguish between intricate details. The introduction of the channel spatial attention (CSA) and semantic guidance attention (SGA) modules selectively focuses on important features and reduces redundancy, effectively bridging semantic gaps. Tests conducted on three medical image datasets (CVC-ClinicDB, CVC-ColonDB, and Kvasir-SEG) showed that our proposed DCSSGA-UNet model could detect object variabilities with improved results and outperformed other comparable methods. It achieved the mean intersection-over-union (mIoU) scores of 95.67%, 92.39%, and 93.97%, as well as mean dice coefficient (mDice) of 98.85%, 95.71%, and 96.10%, respectively. These results highlight the model’s superior precision and exceptional versatility, making it a valuable tool for clinical applications, particularly for accurate lesion segmentation and assisting in the diagnosis and treatment of diseases like colorectal cancer.