- Research
- Open access
- Published:
UGS-M3F: unified gated swin transformer with multi-feature fully fusion for retinal blood vessel segmentation
BMC Medical Imaging volume 25, Article number: 77 (2025)
Abstract
Automated segmentation of retinal blood vessels in fundus images plays a key role in providing ophthalmologists with critical insights for the non-invasive diagnosis of common eye diseases. Early and precise detection of these conditions is essential for preserving vision, making vessel segmentation crucial for identifying vascular diseases that pose a threat to vision. However, accurately segmenting blood vessels in fundus images is challenging due to factors such as significant variability in vessel scale and appearance, occlusions, complex backgrounds, variations in image quality, and the intricate branching patterns of retinal vessels. To overcome these challenges, the Unified Gated Swin Transformer with Multi-Feature Full Fusion (UGS-M3F) model has been developed as a powerful deep learning framework tailored for retinal vessel segmentation. UGS-M3F leverages its Unified Multi-Context Feature Fusion (UM2F) and Gated Boundary-Aware Swin Transformer (GBS-T) modules to capture contextual information across different levels. The UM2F module enhances the extraction of detailed vessel features, while the GBS-T module emphasizes small vessel detection and ensures extensive coverage of large vessels. Extensive experimental results on publicly available datasets, including FIVES, DRIVE, STARE, and CHAS_DB1, show that UGS-M3F significantly outperforms existing state-of-the-art methods. Specifically, UGS-M3F achieves a Dice Coefficient (DC) improvement of 2.12% on FIVES, 1.94% on DRIVE, 2.52% on STARE, and 2.14% on CHAS_DB1 compared to the best-performing baseline. This improvement in segmentation accuracy has the potential to revolutionize diagnostic techniques, allowing for more precise disease identification and management across a range of ocular conditions.
Introduction
Retinal blood vessel segmentation is a critical focus in ophthalmology, playing a vital role in diagnosing and managing ocular, cardiovascular, and systemic diseases such as diabetic and hypertensive retinopathy [1, 2]. Changes in retinal vessel morphology can indicate early signs of these conditions, while abnormalities can guide the diagnosis and treatment of ophthalmic diseases like glaucoma and retinal vascular occlusion [3]. Accurately segmenting retinal vessels from fundus images is essential for understanding disease progression and treatment effectiveness. However, the complexity of retinal vessel patterns, due to their varying sizes, shapes, and intricate structures, poses significant challenges, making traditional segmentation methods time-consuming and prone to errors [4].
Recent medical research underscores the growing need for advanced diagnostic and treatment strategies, particularly through the use of computer-aided diagnosis (CAD) systems [5]. CAD systems have become increasingly important in the early detection and management of various diseases, where their ability to process and analyze vast amounts of medical image data can significantly improve diagnostic accuracy and efficiency [6,7,8]. One of the core components of these systems is feature extraction [9, 10], a process that involves identifying and extracting critical information from medical images [11]. In the context of retinal vessel segmentation, feature extraction is crucial for distinguishing between healthy and abnormal vascular structures. Accurate identification of features such as vessel thickness, curvature, and branching patterns allows clinicians to make more informed decisions regarding the presence and severity of diseases [12]. Nevertheless, traditional feature extraction methods often struggle with the complexity of retinal vessel patterns, resulting in variability in results and potential diagnostic errors.
To address these challenges, researchers have increasingly turned to state-of-the-art Convolutional Neural Networks (CNNs), which have emerged as a powerful tool in medical image analysis. CNNs are particularly well-suited for segmentation tasks due to their ability to learn complex, hierarchical features from raw image data [13,14,15]. The application of CNNs to retinal vessel segmentation, while promising, is still in its early stages. Despite the potential of CNNs to revolutionize this field, several challenges remain, including the issue of false positive detections, where non-vessel structures are incorrectly identified as vessels. This challenge is particularly pronounced in retinal images, which are often affected by various artifacts such as reflections, shadows, and noise, all of which can mislead CNN models and result in incorrect segmentation [16].
Motivated by these challenges, this paper introduces a novel CNN-based algorithm specifically designed for accurate retinal blood vessel segmentation in fundus images. Our approach prioritizes enhancing the precision and reliability of segmentation, aiming to minimize false positives and improve the overall robustness of the model. Comparative experiments suggest that the proposed framework demonstrates competitive performance against existing state-of-the-art methods, showing promise in improving segmentation accuracy. In summary, the principal contributions of this paper are as follows:
-
(1)
It presents a new framework, UGS-M3F, for effective retinal blood vessel segmentation, with a particular focus on small and critical vessel boundaries.
-
(2)
It proposes an efficient UM2F paradigm to capture more specific and effective contextual information.
-
(3)
It designs a novel GBS-T module that enhances the performance of the segmentation system by focusing the UGS-M3F model on retinal vessels in fundus images.
-
(4)
It presents a series of comparative experiments to validate the effectiveness of UGS-M3F.
The proposed framework has been validated on publicly available retinal image datasets [17,18,19,20] and compared with state-of-the-art CNNs previously evaluated in the literature. Unlike existing research, which often relies on cross-level feature aggregation approaches [21, 22], this study introduces a novel deep learning architecture for retinal blood vessel segmentation. This architecture leverages a multi-feature fusion strategy, incorporating multi-context feature fusion and a gated Swin Transformer for improved segmentation accuracy. The performance evaluation analysis demonstrates the potential clinical value of the proposed framework.
The structure of this paper is as follows: In “Related works” section, we elaborate on the novel methods employed for retinal blood vessel segmentation. “Proposed approach” section provides a detailed explanation of the proposed approach. Subsequently, the outcomes of experiments conducted on retinal image datasets are presented in “Experimental results” section. Finally, we conclude this paper in “Conclusion” section.
Related works
In recent years, several studies have advanced automatic algorithms for segmenting retinal blood vessels from fundus images [23]. CNNs have emerged as the dominant approach, with pixel-wise classification methods demonstrating exceptional performance [24]. The U-Net architecture [25], renowned for its ability to capture both local and global image features, has been extensively adopted and refined. Variants such as Genetic U-Net (GU-Net) [26] and Context Spatial U-Net (CSU-Net) [27] have incorporated genetic algorithms and spatial context information, respectively, to enhance segmentation accuracy and robustness. Furthermore, researchers have proposed novel U-Net modifications to improve performance. For instance, an Improved U-Net (IU-Net) with residual modules achieved high accuracy on retinal vessel segmentation tasks [28]. Bridge-Net, a context-involved U-Net with patch-based loss weight mapping, demonstrated its effectiveness in this area [29]. Additionally, Dual Encoding U-Net (DEU-Net), which utilizes a dual encoding strategy, has been explored for retinal vessel segmentation [30]. Another notable approach is the Pyramid U-Net (PU-Net), which employs a pyramid-shaped architecture to capture multi-scale features effectively, leading to improved segmentation results [31].
Inspired by the success of attention mechanisms in enhancing feature extraction and discrimination in computer vision [32], several studies have focused on applying these techniques to retinal blood vessel segmentation. The Attention Guided Network (AG-Net) [33] introduced attention blocks to refine feature maps and suppress irrelevant background information, improving segmentation accuracy. To further explore the potential of attention mechanisms, the Width Attention-Based Convolutional Neural Network (WA-Net) [34] proposed a novel attention module to capture spatial dependencies and enhance feature representation. Building upon the U-Net architecture, the Attention-Inception-based U-Net (AIU-Net) with advanced residual connections [35] integrated attention mechanisms into the inception modules to effectively learn discriminative features and improve segmentation performance. Recent advancements have delved deeper into various attention mechanism applications. A method based on an improved U-Net model incorporated Series Deformable Convolution and an attention mechanism based on the U-Net structure (SDAU-Net) to enhance feature extraction capabilities [36]. Multi-layer preprocessing and U-Net with Residual Attention Block (URA-Net) utilized a combination of preprocessing techniques and a residual attention block within the U-Net framework for improved vessel segmentation [37]. The Rough Channel Attention Residual U-Net (RCAR-UNet) model leverages a novel rough attention mechanism to effectively capture long-range dependencies in retinal images [38]. Additionally, a Soft Attention Mechanism-Based Network (SAM-Net) demonstrates promising results by employing a soft attention mechanism to extract blood vessel features [39].
Recently, generative-based CNN models have garnered significant attention in the field of medical imaging, particularly for retinal blood vessel segmentation. Among these, the Attention Augmented Wasserstein Generative Adversarial Network (AA-WGAN) [40] has emerged as a promising approach for segmenting retinal vessels in fundus images. By incorporating attention mechanisms into the Generative Adversarial Network (GAN) framework, this model enhances its ability to focus on critical features, allowing it to generate more accurate and detailed vessel maps. Another notable approach in this domain is the Refined Equilibrium Generative Adversarial Network (RE-GAN) [41] for retinal vessel segmentation. This model introduces equilibrium constraints into the GAN architecture, ensuring that the generator and discriminator reach a stable state, which is crucial for producing high-quality segmentation outputs. This refined equilibrium mechanism helps mitigate common issues in GAN training, such as mode collapse, and contributes to more consistent and reliable vessel segmentation. Additionally, recent research has explored combining GANs with other techniques to further improve segmentation results. A Residual attention UNet GAN Model (RAU-GAN) utilizes a combination of residual attention blocks within a U-Net architecture and a GAN framework to enhance feature extraction and vessel segmentation accuracy [42]. Furthermore, a method employing Retinal Vessel Segmentation via Adversarial Learning and Iterative Refinement (RL-IR) demonstrates the effectiveness of incorporating iterative refinement with adversarial learning for improved vessel segmentation [43]. Along the same lines, a recent work proposes a Physics-informed Generative Adversarial Network (PI-GAN) [44] for quantitative assessment of the retina. This approach leverages established biophysical principles to generate highly realistic digital models of retinal blood vessels, enabling automatic segmentation and reconstruction without human input and achieving superior performance compared to human labeling.
Lately, fusion methods that leverage multi-feature CNNs have been increasingly employed for retinal blood vessel segmentation. By combining information from multiple scales and feature levels, these methods aim to capture the complex and hierarchical structure of retinal vessels. The Multi-Scale Attention-Guided Fusion Network (MAGF-Net) [21] incorporates attention mechanisms to effectively fuse multi-scale features and enhance segmentation accuracy. Similarly, the Integrated Multi-Scale Feature Fusion Network (IMS2F-Net) [22] integrates multi-scale features at different levels to preserve spatial information and improve segmentation performance. Additionally, the A Multi-Scale Attention Fusion Network (MSAF-Net) leverages attention mechanisms to selectively combine features from various scales, leading to improved vessel segmentation results [45]. These fusion-based approaches have demonstrated promising results in accurately segmenting retinal vessels from fundus images. The overview of related works focusing on retinal blood vessel segmentation techniques is indicated in Table 1.
Multi-context feature fusion and gated boundary-aware transformers are critical for effectively segmenting retinal blood vessels, as they enable precise boundary recovery, extraction of intricate vessel structures, and improved overall segmentation accuracy. Despite their significance, the existing literature on retinal blood vessel segmentation has only a few studies leveraging these advanced techniques. Given the limited research in this area, it is essential to develop a comprehensive approach that captures fine vascular details while preserving spatial information throughout the segmentation process.
Proposed approach
The network architecture is presented in Fig. 1, with an overview provided first, followed by a detailed explanation of each component. The UGS-M3F block, which is composed of n UM2F and GBS-T modules, is designed to enhance the extraction of vessel skeletons from retinal images. The framework’s structure and functionality are outlined in the subsequent sections.
Data preprocessing
The preprocessing of digital fundus images is conducted through three key stages: normalization of data size, intensity adjustment, and augmentation of data, aimed at enhancing the efficacy of UGS-M3F in retinal blood vessel segmentation. The resolutions of fundus images across the FIVES, DRIVE, STARE, and CHAS_DB1 datasets vary, measuring \(2048\times 2048\), \(565\times 584\), \(605\times 700\), and \(999\times 960\), respectively. Given these varying resolutions and the associated computational demands, it is essential to rescale the fundus images. To achieve this, we first crop the central section of the retinal vessel region and subsequently apply a pyramidal down-sampling method, as described in [46], to standardize all images to a resolution of \(512\times 512\) pixels.
The second preprocessing step incorporates an adaptive bi-histogram equalization algorithm [47] to improve image brightness. Due to the limited quantity of original fundus images, training the UGS-M3F model poses challenges. To overcome this obstacle, we enhance the dataset through geometric transformation techniques, as mentioned in [46]. Following the data preparation, the total count of fundus images is increased by a factor of 91.
Unified Gated Swin Transformer with Multi-Feature Fully Fusion (UGS-M3F)
Accurately capturing salient features is essential for improving segmentation performance due to the complex topological structure of retinal blood vessels and the significant variation in vessel thickness. This paper introduces the UGS-M3F architecture, which excels at localizing blood vessels in retinal images. The framework offers several key advantages: it directs its learning toward blood vessel regions, minimizes ambiguity between objects, and captures critical contextual information. As depicted in Fig. 1, the architecture consists of n encoder (Ei) and decoder (Di) layers tailored for semantic segmentation tasks.
The UGS-M3F block incorporates a novel UM2F module to capture detailed blood vessel features and contextual information, along with a GBS-T module that enhances the focus on salient features. Specifically, the UGS-M3F block consists of n UM2F and GBS-T modules. Additionally, the final UM2F module functions as a Transition Block (T-Block), connecting the encoder and decoder stages. A max pooling layer with a \(2\times 2\) kernel size, placed after the UM2F blocks, progressively reduces the feature map resolution. While the UM2F module extracts latent feature representations from the input images, the GBS-T path upsamples the feature maps and uses cross-connections to direct the model’s focus toward the blood vessel areas in the retinal images.
Moreover, incorporating the gated boundary Swin transformer is crucial for extracting vascular skeletons. This module selectively focuses on blood vessel regions within the retinal image and captures long-range dependencies, enabling the model to analyze relationships between distant features.
The decision to use a convolution-based encoder and a transformer-based decoder was made to leverage the strengths of both architectures. The convolution-based encoder excels at extracting low-level spatial features and capturing local patterns essential for retinal vessel segmentation. In contrast, the transformer-based decoder captures global contextual relationships and long-range dependencies, which are critical for accurately reconstructing complex vascular structures. This complementary combination enhances both the precision and robustness of the segmentation process.
Overall, the framework introduces three main innovations. First, the UM2F module is designed to capture both intricate details and contextual information. Second, the GBS-T module highlights key vessel features for segmentation. Finally, the use of multiple parallel UM2F-GBS-T structures significantly enhances segmentation accuracy. The UGS-M3F framework, as outlined in Algorithm 1, comprises a series of encoder-decoder blocks, with their respective output shapes summarized in Table 2.

Algorithm 1 Unified Gated Swin Transformer with Multi-Feature Fully Fusion (UGS-M3F)
Unified Multi-context Feature Fusion (UM2F)
As shown in Fig. 2, the UM2F module leverages a synergistic combination of Dual Path Networks (D-Net), Residual Networks with Aggregated Residual Transformations (R-Next), EfficientNet (E-Net), and Early Fusion with Concatenation (EF-C). Each of these components plays a critical role in enhancing the overall performance of the model for retinal blood vessel segmentation.
The D-Net module builds on the original dual-path architecture [48], combining the benefits of both dense and residual connections. This hybrid approach addresses challenges such as the vanishing gradient problem, improving the flow of information across the network. As a result, D-Net ensures better feature propagation, which is particularly important when dealing with the complex and fine-grained structures of retinal blood vessels. By integrating both local and global feature capture, D-Net excels at identifying blood vessels with high precision, making it highly effective for segmenting thin vessels and distinguishing them from the surrounding tissue.
The inclusion of R-Next is crucial for handling the various scales, thicknesses, and complexities of blood vessels within retinal images [49]. The R-Next module, with its ability to extract features across multiple scales, contributes to a rich and detailed representation of the vessel structure. This is especially important for accurately delineating the central axis of the vessels and improving the skeletonization process, which is a critical step in defining vessel boundaries and overall structure. The multi-scale feature extraction offered by R-Next enhances the model’s capability to detect both large and small vessels, ensuring comprehensive vessel identification.
In addition to D-Net and R-Next, the Early Fusion with Concatenation (EF-C) module plays a key role in balancing computational efficiency and model accuracy. After the D-Net processes the input, the resulting features are passed through a \(1\times 1\) convolutional layer to reduce dimensionality before fusing with the R-Next output via element-wise addition. This fusion not only combines the strengths of the two networks but also employs the Gaussian Error Linear Unit (GELU) activation function, which enables the model to capture complex relationships between features and enhance the training process [50]. The subsequent \(1\times 1\) convolutional layers are specifically designed to assess the importance of different features, ensuring that the model focuses on the most relevant aspects for accurate blood vessel segmentation.
The outputs of the EF-C module are then fed into E-Net [51], a highly efficient architecture that extracts multi-scale features essential for accommodating the diverse shapes and sizes of blood vessels in retinal images. E-Net is particularly effective at balancing accuracy with computational demands, making it well-suited for real-time applications in medical image analysis. By leveraging its efficient design, E-Net contributes to the overall scalability of the UM2F framework, ensuring it can be applied to fundus datasets and in environments where computational resources may be limited.
Ultimately, the combination of D-Net, R-Next, E-Net, and EF-C modules results in a powerful multi-scale approach that integrates features from multiple levels of the network. This fusion enables UM2F to excel in the challenging task of retinal blood vessel segmentation, achieving state-of-the-art performance while maintaining operational efficiency. This makes the UM2F architecture highly applicable to real-world scenarios, offering a robust solution for accurate and efficient retinal image analysis.
Gated Boundary-aware Swin Transformer (GBS-T)
GBS-T is one of the most advanced modules within the UGS-M3F framework, drawing inspiration from the CGA-T module found in the 2MGAS-Net framework [52]. By integrating GBS-T, the UGS-M3F model becomes more proficient at identifying discriminative features crucial for distinguishing blood vessel pixels from the background, while also accurately estimating the positions of complex regions within retinal images. The process, as depicted in Fig. 3, unfolds as follows: first, the output from Ei is directed to an element-wise addition layer. To maintain depth consistency with Ei, Di+1 undergoes processing through a \(1\times 1\) convolutional layer. An upsampling layer then adjusts the resolution of Di+1 to match that of Ei. The combined features from Ei and the upsampled Di+1 are subsequently passed into the GBS-T branches for further refinement.
The GBS-T module consists of three deep branches, all sharing a similar architecture but trained on different inputs. Each branch integrates Swin Transformer (Swin-T) modules [53], \(1\times 1\) convolutional layers, gated layers with update and reset gates, and three fusion layers: concatenation, element-wise averaging, and element-wise addition. Following each convolutional layer and Swin-T, a GELU activation function is applied to promote non-linearity, ensuring efficient feature extraction.
To promote training stability and improve generalization, batch normalization is applied to the concatenated outputs of the tanh and sigmoid layers. The reset and update gates, located after batch normalization and the final convolutional layers, respectively, regulate the flow of information through the network, helping to filter out irrelevant features. The tanh and sigmoid layers further focus on key feature representations, refining the extraction process while minimizing noise.
The GBS-T module employs tanh and sigmoid activation functions due to their complementary roles in enhancing feature representation and information flow. The tanh function maps inputs to a symmetric range of [−1,1], aiding in the normalization of features and promoting balanced gradient propagation, which is crucial for stable training in deep networks. Conversely, the sigmoid function, with its range of [0,1], is particularly effective for refining attention weights and emphasizing key features. This combination ensures that the module benefits from tanh’s ability to enhance gradient flow and symmetry while leveraging sigmoid’s capability to focus on salient regions, especially in complex vascular structures. Together, these activation functions contribute to the improved accuracy and robustness of the UGS-M3F framework.
The implementation of gated boundary-aware Swin Transformers for blood vessel segmentation introduces several critical advantages. First, the gated attention mechanisms empower the model to focus more effectively on relevant features, while filtering out extraneous data. This selective attention improves the overall segmentation accuracy, ensuring that even small, thin vessels are accurately identified. Second, the Swin Transformer architecture excels at capturing long-range dependencies within retinal images. This capability is vital for identifying the intricate branching structures and complex networks of blood vessels, which often extend across wide spatial areas. The parallel processing enabled by the Swin-T modules enhances the model’s ability to handle the variability and complexity of retinal images, ensuring robust and precise vessel segmentation. As a result, the GBS-T module significantly improves the overall performance of the UGS-M3F framework, making it a powerful tool for retinal image analysis and disease diagnosis.
Experimental results
In this section, we present and thoroughly analyze the experimental results of the UGS-M3F framework, as previously introduced in “Proposed approach” section. Through iterative network training, we validated the proposed UGS-M3F system by systematically adjusting the number of UM2F and GBS-T blocks. The robustness of our methodology was rigorously tested on the FIVES, DRIVE, STARE, and CHAS_DB1 datasets.
To substantiate the effectiveness of our proposed approach, we conducted a comprehensive evaluation, including statistical measurements and quantitative analyses. Our strategy was directly compared with state-of-the-art methods for retinal blood vessel segmentation, all evaluated on the same datasets and using identical metrics.
Data acquisition
Extensive experiments were conducted using four retinal blood vessel segmentation datasets: FIVES [17], DRIVE [18], STARE [19], and CHAS_DB1 [20]. A summary of key information about these datasets is provided in Table 3. FIVES is a publicly accessible dataset comprising 800 fundus images, each paired with a corresponding vessel segmentation mask. These images are in PNG format with a resolution of \(2048\times 2048\) pixels. Another well-established dataset, DRIVE, consists of 40 fundus images collected from patients diagnosed with diabetic retinopathy. These images have a standard resolution of \(565\times 584\) pixels and are also annotated with vessel segmentation masks. STARE, a smaller dataset, features 20 fundus images with a resolution of \(605\times 700\) pixels. These images have been meticulously labeled for vessel segmentation by medical experts. Lastly, CHAS_DB1 includes 28 fundus images with a resolution of \(999\times 960\) pixels, also annotated for vessel segmentation tasks. To evaluate the performance of the proposed methods, the datasets were randomly partitioned into training (60%), validation (20%), and test sets (20%).
Implementation details
In this study, we trained the UGS-M3F model using the Adam optimizer [54]. The training parameters were set as follows: learning rate = \(10^{-5}\), first moment-decay = 0.9, and second moment-decay = 0.999. We initialized the weights using the He_normal initialization method [55].
To address the class imbalance in retinal blood vessel pixels, we employed a combination of weighted binary cross-entropy (wBCE) and Dice loss as the loss function [56]. The total loss is computed as:
where \(\alpha\) and \(\beta\) are weights used to balance the contributions of the two components. The weighted binary cross-entropy loss (\(L_{\text {wBCE}}\)) assigns higher importance to the minority class (vessel pixels) by using class weights determined as the inverse of the class frequencies in the training set. This approach helps to address the class imbalance by increasing the model’s sensitivity to vessel structures. The Dice loss (\(L_{\text {Dice}}\)) complements this by optimizing the overlap between the predicted segmentation and the ground truth, ensuring accurate segmentation of both vessel and background regions. Together, these loss components provide a robust optimization strategy for our task.
The training process consisted of 200 epochs with a batch size of 64. All experiments were conducted on a workstation with an Intel Xeon E5-2630 v4 Octa-Core processor, 25 MB cache, 64 GB RAM, and a single NVIDIA Quadro RTX 5000 GPU with 16 GB of GDDR6 memory. The network was implemented in PyTorch using Python 3. The training phase took approximately 23 hours and 32 minutes, while testing took 7 hours and 26 minutes. Detailed training and inference times for each dataset, along with the peak GPU memory usage, are summarized in Table 4.
Evaluation metrics
The evaluation of the performance of our proposed system was conducted using six quantitative metrics: accuracy (Acc), sensitivity (Sen), specificity (Spe), Dice Coefficient (DC), Jaccard Score (JC), and Area Under the Receiver Operating Characteristic Curve (AUC).
Accuracy is the proportion of the total number of correct predictions, providing an overall measure of the classification performance. Sensitivity, also known as recall or true positive rate, indicates the proportion of actual positive cases that were correctly identified. Specificity reflects the proportion of actual negative cases that were correctly identified, thus evaluating the model’s ability to avoid false positives. The Dice Coefficient is a measure of overlap between the predicted and the ground truth segmentation, serving as a measure of similarity. The Jaccard Score, also known as the Intersection over Union (IoU), is another metric for similarity that compares the size of the intersection to the size of the union of the predicted and ground truth segmentations. The AUC metric evaluates the model’s ability to distinguish between classes by measuring the area under the Receiver Operating Characteristic (ROC) curve. A higher AUC value indicates better model performance in terms of classification. The True Positive Rate (TPR) represents the proportion of actual positive cases that are correctly identified, while the False Positive Rate (FPR) represents the proportion of actual negative cases that are incorrectly classified as positive.
The results are revealed through the confusion matrix, which is a statistical tool used to assess the effectiveness of the segmentation system. It provides four outcomes: True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN), where TP represents correct positive predictions, FP represents incorrect positive predictions, TN represents correct negative predictions, and FN represents incorrect negative predictions. The six assessment metrics can be defined as follows:
Segmentation results
This section delves into the optimal network architecture for the UGS-M3F framework, specifically tailored to the task of retinal blood vessel segmentation. To comprehensively assess the influence of different architectural configurations, an ablation study was conducted, employing five common metrics: Accuracy, Sensitivity, Specificity, Dice Coefficient, and Jaccard Index.
Table 5 summarizes the outcomes of varying the number of En-Dn blocks within UGS-M3F across four distinct fundus image datasets: FIVES, DRIVE, STARE, and CHAS_DB1. Representative segmentations generated by the model are visually illustrated in Figs. 4, 5, 6, and 7. A systematic comparison of seven UGS-M3F variations (En-Dn) with increasing block levels, ranging from 2 to 8, was undertaken.
The experimental results unequivocally demonstrate a general improvement in performance as the number of Encoder-Decoder blocks (E2-D2 to E6-D6) is progressively increased. This enhancement suggests that a greater number of blocks empowers the model to extract more intricate and informative features from the input images. However, a notable trade-off emerges when the block count is further elevated (E7-D7 and E8-D8), as overfitting becomes a prevalent issue, highlighting the delicate balance between model complexity and generalizability.
Our findings underscore the significance of the E6-D6 configuration in optimizing the effectiveness of the UGS-M3F framework. This particular configuration strikes a harmonious balance between accuracy, efficiency, and generalizability, making it well-suited for a broad range of retinal blood vessel segmentation tasks. The model’s exceptional capability to accurately delineate blood vessels, even in challenging scenarios, underscores its potential to substantially enhance the diagnosis and management of ocular diseases.
Additionally, to transparently evaluate the limitations of our approach, we have included Fig. 8, which presents failure cases of the UGS-M3F framework. This figure illustrates scenarios where the model faces challenges, such as poor segmentation in low-contrast regions or when noise significantly impacts the results, providing a clear view of areas where the model’s performance could be improved.
Comparison with the state-of-the-art
To demonstrate the robustness of our solution for retinal blood vessel segmentation, we conducted comprehensive experiments on the FIVES, DRIVE, STARE, and CHAS_DB1 datasets using several state-of-the-art segmentation methods. Specifically, we compared our method against nine cutting-edge approaches: U-Net [24], Genetic U-Net (GU-Net) [26], Width Attention Network (WA-Net) [34], Attention Augmented Wasserstein Generative Adversarial Network (AA-WGAN) [40], Multiscale Attention-Guided Fusion Network (MAGF-Net) [21], No New U-Net (NN-UNet) [57], Transformer-based U-Net (TRANS-UNet) [58], Swin Transformer-based U-Net (SWIN-UNet) [59], and Integrated Multi-Scale Feature Fusion Network (IMFF-Net) [22]. Each of these methods was selected due to their proven effectiveness in medical image segmentation, offering a rigorous benchmark for evaluating our model’s performance. To ensure a fair comparison, we accurately reproduced the experiments using the same training, validation, and test datasets for all methods, employing consistent data preprocessing techniques. This involved standardized steps such as normalization, contrast enhancement, and data augmentation, ensuring that all models were trained under identical conditions, including the image resizing procedure to \(512\times 512\) pixels. All experiments were executed on an NVIDIA Quadro RTX 5000 GPU, ensuring uniform computational resources across approaches, thereby eliminating hardware-induced performance variations. As shown in Tables 6, 7, 8, and 9, our method consistently outperforms these state-of-the-art techniques across all evaluation metrics, including accuracy, sensitivity, specificity, Dice coefficient, Jaccard score, and AUC. Qualitative comparisons in Figs. 9, 10, 11, and 12 visually confirm the improved accuracy and detailed boundary capture of our model compared to the baselines, highlighting its ability to detect finer vascular structures that are often missed by other methods. These findings underscore the potential of UGS-M3F to significantly enhance diagnostic capabilities and clinical decision-making in retinal image analysis, ultimately contributing to more accurate and early detection of retinal diseases.
Performance comparisons of UGS-M3F with other state-of-the-art methods on the FIVES, DRIVE, STARE, and CHAS_DB1 datasets are presented in Tables 10, 11, 12, and 13, respectively. These tables provide a detailed analysis of UGS-M3F’s performance in terms of model complexity (Number of Parameters), memory consumption (measured as GPU usage in gigabytes, GB), inference speed (measured in milliseconds, ms), and computational cost (GFLOPs). The results highlight the superior computational efficiency of UGS-M3F compared to the state-of-the-art methods. UGS-M3F consistently achieves the lowest number of parameters (NP), memory consumption (MC), and computational cost (GFLOPs), while demonstrating the fastest inference speed (IS) across all datasets. However, it is important to note that the number of parameters is derived from the entire dataset, not a single image. In particular, the FIVES dataset requires more parameters than the others due to its larger size, containing 800 images, which increases significantly after data augmentation by a factor of 91. This expansion leads to a higher number of parameters, as the model adapts to capture the increased variability and complexity of the dataset. In contrast, methods such as TRANS-UNet [58] and SWIN-UNet [59] exhibit significantly higher complexity due to their transformer-based architectures, which result in increased memory consumption and computational demands. These findings underscore UGS-M3F’s efficiency, making it more suitable for real-world applications where resource constraints and real-time processing are critical. Its lightweight design enables scalability to larger datasets and higher-resolution inputs without incurring prohibitive computational costs. Despite its simplicity, UGS-M3F maintains competitive or superior performance across all datasets, reaffirming its robustness and practicality in retinal image analysis tasks.
Discussion
This paper presents a novel framework, UGS-M3F, designed for precise segmentation of retinal blood vessels in fundus images. UGS-M3F builds upon successful multi-feature CNN fusion techniques to address challenges such as variations in vessel width and branching, low contrast, unclear boundaries, and the complex nature of retinal structures. By incorporating advanced methods, including the gated Swin transformer and feature fusion, UGS-M3F aims to enhance segmentation performance across various retinal imaging scenarios.
A study on the optimal number of En-Dn blocks in the UGS-M3F architecture reveals an interesting pattern in retinal vessel segmentation. As shown in Table 5, increasing the block count generally improves performance; however, there is a saturation point at E6-D6. Beyond this, performance begins to decline, as seen with configurations E7-D7 and E8-D8. This decline can be attributed to two main factors: first, adding too many blocks increases the risk of overfitting, where the model becomes overly specialized to the training data and struggles to generalize to new images. This is evident in the accuracy drop observed with E7-D7 and E8-D8. Second, after E6-D6, the additional feature extraction may reach diminishing returns, as the essential features for effective segmentation have likely been captured. Any further increase in complexity, as with E7-D7 and E8-D8, could impede the learning process. These findings highlight the importance of balancing the number of Encoder-Decoder blocks, with E6-D6 being optimal for effective segmentation, while exceeding this number introduces drawbacks. This balance is particularly critical in real-world retinal imaging, where efficiency and accuracy are key.
To comprehensively assess the UGS-M3F algorithm’s performance, the experimental results are compared with nine prior methods, categorized into four groups: pixel-wise classification CNNs [24, 26, 57], attention-based CNNs [34, 58, 59], generative-based CNN models [40], and multi-feature CNN fusion techniques [21, 22]. Tables 6, 7, 8, and 9 demonstrate that UGS-M3F consistently outperforms these methods across datasets such as FIVES, DRIVE, STARE, and CHAS_DB1. Additionally, Figs. 9, 10, 11, and 12 showcase the superior segmentation quality achieved by UGS-M3F on four datasets: FIVES, DRIVE, STARE, and CHAS_DB1. While conventional approaches like U-Net [24] and GU-Net [26] struggle with thin vessels and complex branching patterns, WA-Net [34] fails to detect smaller capillaries, and AA-WGAN [40] leads to incomplete vessel structures. Methods such as MAGF-Net [21], NN-UNet [57], and TRANS-UNet [58] also face challenges with fine vessel detection. Although SWIN-UNet [59] and IMFF-Net [22] detect most vessels, they often miss boundary details and fine capillary networks. In contrast, UGS-M3F effectively captures intricate structures, extracts complex vascular areas, and reliably detects complete vessel networks, demonstrating significant improvements in segmentation accuracy and reliability.
The comparison of UGS-M3F with existing methods across the FIVES, DRIVE, STARE, and CHAS_DB1 datasets, as shown in Tables 10, 11, 12, and 13, demonstrates its outstanding computational efficiency. UGS-M3F consistently outperforms other models in terms of the number of parameters, memory consumption (measured in GPU usage), and computational cost (GFLOPs), while offering the fastest inference speed. In contrast, methods like TRANS-UNet [58] and SWIN-UNet [59] exhibit higher complexity, leading to increased memory consumption and slower speeds due to their transformer-based architectures. This highlights UGS-M3F’s practicality for real-world applications, where efficiency, scalability, and speed are crucial for real-time retinal image analysis.
The UGS-M3F framework generally performs well but faces challenges, especially in handling images with extreme noise, low contrast, or significant occlusion. In these cases, the algorithm struggles to accurately segment vessel boundaries, leading to incomplete or fragmented results. Additionally, it sometimes misinterprets complex vessel patterns, causing false positives or incorrect connections. These failure cases suggest the need for further refinement, such as improved preprocessing or incorporating additional contextual information. These issues are shown in Fig. 8.
Despite its advantages, UGS-M3F has some limitations. Higher block configurations can lead to overfitting, reducing generalizability. Moreover, the model’s dependence on specific datasets might hinder its adaptability to different imaging modalities or patient populations. Computational demands may also pose challenges in resource-constrained environments. Furthermore, UGS-M3F might struggle with extremely low-contrast regions or images with severe artifacts. Future research should focus on addressing these issues to enhance the framework’s robustness and applicability in broader clinical settings.
Conclusion
This study presents UGS-M3F, a novel deep learning model for retinal blood vessel segmentation in fundus images, designed to address the challenges posed by the intricate and varied structures of retinal vasculature. UGS-M3F consists of two robust blocks: UM2F and GBS-T, each contributing to different aspects of vessel segmentation. The UM2F module is meticulously designed to capture detailed vessel features through dual path networks, residual networks with aggregated residual transformations, EfficientNet, and early fusion with concatenation mechanisms, ensuring that even the thinnest capillaries are accurately identified. On the other hand, the GBS-T block incorporates a gated swin transformer strategy, enabling UGS-M3F to effectively focus on both small and large retinal vessels in fundus images, which is crucial for comprehensive analysis. Furthermore, the repetitive UM2F and GBS-T blocks are strategically introduced to handle the complex structure of retinal vessels and significantly boost performance, allowing the network to progressively refine its predictions with each iteration. Extensive experiments conducted on benchmark datasets, such as FIVES, DRIVE, STARE, and CHAS_DB1, demonstrate the superior segmentation performance of UGS-M3F compared to state-of-the-art methods, consistently achieving higher accuracy and better vessel delineation. These results highlight the potential of UGS-M3F to significantly advance conventional retinal blood vessel segmentation techniques and indicate its promise for real-world clinical applications, particularly in aiding the early detection of retinal diseases. In the future, we will focus on improving UGS-M3F’s adaptability to diverse datasets by integrating domain adaptation techniques and validating its performance across varied imaging modalities and patient populations. To address computational efficiency, we plan to explore model compression and optimization strategies for deployment in resource-constrained environments. Additionally, we aim to enhance robustness against challenging images with extreme noise, low contrast, or severe artifacts by incorporating advanced preprocessing and contextual information mechanisms. This will facilitate smoother integration into clinical workflows for ocular disease diagnosis and treatment planning, ultimately contributing to improved patient outcomes.
Data availability
All relevant datasets used in this study will be made available upon request.
References
Galdran A, Anjos A, Dolz J, Chakor H, Lombaert H, Ayed IB. State-of-the-art retinal vessel segmentation with minimalistic models. Sci Rep. 2022;12(1). https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41598-022-09675-y.
Ansari MY, Qaraqe M, Righetti R, Serpedin E, Qaraqe K. Unveiling the future of breast cancer assessment: a critical review on generative adversarial networks in elastography ultrasound. Front Oncol. 2023;13. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fonc.2023.1282536.
Kanai M, Sakimoto S, Hara C, Fukushima Y, Sayanagi K, Nishida K, Sakaguchi H, Nishida K. The caliber of optociliary shunt vessels is associated with macular blood flow and visual acuity in central retinal vein occlusion. Ophthalmol Sci. 2022;2(1):100083. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.xops.2021.100083.
Gao Z, Zhou L, Ding W, Wang H. A retinal vessel segmentation network approach based on rough sets and attention fusion module. Inf Sci. 2024;678:121015. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ins.2024.121015.
Prakash D, Gupta A. Introduction to computer-aided diagnosis (CAD) tools and applications. Adv Comput. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/bs.adcom.2024.07.001.
Ansari MY, Yang Y, Balakrishnan S, Abinahed J, Al-Ansari A, Warfa M, Almokdad O, Barah A, Omer A, Singh AV, Meher PK, Bhadra J, Halabi O, Azampour MF, Navab N, Wendler T, Dakua SP. A lightweight neural network with multiscale feature enhancement for liver CT segmentation. Sci Rep. 2022;12(1). https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41598-022-16828-6.
Ansari MY, Yang Y, Meher PK, Dakua SP. Dense-PSP-UNet: A neural network for fast inference liver ultrasound segmentation. Comput Biol Med. 2022;153:106478. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compbiomed.2022.106478.
Ansari MY, Mangalote IAC, Meher PK, Aboumarzouk O, Al-Ansari A, Halabi O, Dakua SP. Advancements in Deep Learning for B-Mode Ultrasound Segmentation: A Comprehensive Review. IEEE Trans Emerg Top Comput Intell. 2024;8(3):2126–49. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/tetci.2024.3377676.
Bakkouri S, Elyousfi A. An adaptive CU size decision algorithm based on gradient boosting machines for 3D-HEVC inter-coding. Multimed Tools Appl. 2023;82(21):32539–57. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11042-023-14540-9.
Bakkouri S, Elyousfi A. Early termination of CU partition based on boosting neural network for 3D-HEVC inter-coding. IEEE Access. 2022;10:13870–83. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/access.2022.3147502.
Bakkouri S, Elyousfi A. Machine learning-based fast CU size decision algorithm for 3D-HEVC inter-coding. J Real-Time Image Process. 2021;18(3):983–95. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11554-020-01059-7.
Toptaş B, Hanbay D. Retinal blood vessel segmentation using pixel-based feature vector. Biomed Signal Process Control. 2021;70:103053. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.bspc.2021.103053.
Rai P, Ansari MY, Warfa M, Al-Hamar H, Abinahed J, Barah A, Dakua SP, Balakrishnan S. Efficacy of fusion imaging for immediate post-ablation assessment of malignant liver neoplasms: A systematic review. Cancer Med. 2023;12(13):14225–51. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/cam4.6089.
Demir F, Taşcı B. An Effective and Robust Approach Based on R-CNN+LSTM Model and NCAR Feature Selection for Ophthalmological Disease Detection from Fundus Images. J Personalized Med. 2021;11(12):1276. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/jpm11121276.
John C, Sahoo J, Sajan IK, Madhavan M, Mathew OK. CNN-BLSTM based deep learning framework for eukaryotic kinome classification: An explainability based approach. Comput Biol Chem. 2024;112:108169. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compbiolchem.2024.108169.
Das S, Chakraborty S, Mishra M, Majumder S. Assessment of retinal blood vessel segmentation using U-Net model: A deep learning approach. Frankl Open. 2024;8:100143. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.fraope.2024.100143.
Jin K, Huang X, Zhou J, Li Y, Yan Y, Sun Y, Zhang Q, Wang Y, Ye J. FIVES: A fundus image dataset for artificial intelligence based vessel segmentation. Sci Data. 2022;9(1). https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41597-022-01564-3.
Staal J, Abramoff MD, Niemeijer M, Viergever MA, Van Ginneken B. Ridge-based vessel segmentation in color images of the retina. IEEE Trans Med Imaging. 2004;23(4):501–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/tmi.2004.825627.
Hoover A, Kouznetsova V, Goldbaum M. Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Trans Med Imaging. 2000;19(3):203–10. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/42.845178.
Fraz MM, Remagnino P, Hoppe A, Uyyanonvara B, Rudnicka AR, Owen CG, Barman SA. An ensemble classification-based approach applied to retinal blood vessel segmentation. IEEE Trans Biomed Eng. 2012;59(9):2538–48. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/tbme.2012.2205687.
Li J, Gao G, Liu Y, Yang L. MAGF-Net: A multiscale attention-guided fusion network for retinal vessel segmentation. Meas. 2023;206:112316. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.measurement.2022.112316.
Liu M, Wang Y, Wang L, Hu S, Wang X, Ge Q. IMFF-Net: An integrated multi-scale feature fusion network for accurate retinal vessel segmentation from fundus images. Biomed Signal Process Control. 2024;91:105980. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.bspc.2024.105980.
Tchinda BS, Tchiotsop D, Noubom M, Louis-Dorr V, Wolf D. Retinal blood vessels segmentation using classical edge detection filters and the neural network. Inform Med Unlocked. 2021;23:100521. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.imu.2021.100521.
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Lecture notes in computer science. 2015. pp. 234-241. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-319-24574-4_28.
Sanjeewani N, Yadav AK, Akbar M, Kumar M, Yadav D. Retinal blood vessel segmentation using a deep learning method based on modified U-NET model. Multimed Tools Appl. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11042-024-18696-w.
Wei J, Zhu G, Fan Z, Liu J, Rong Y, Mo J, Li W, Chen X. Genetic U-Net: Automatically designed deep networks for retinal vessel segmentation using a genetic algorithm. IEEE Trans Med Imaging. 2022;41(2):292–307. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/tmi.2021.3111679.
Wang B, Wang S, Qiu S, Wei W, Wang H, He H. CSU-Net: A context spatial U-Net for accurate blood vessel segmentation in fundus images. IEEE J Biomed Health Inform. 2021;25(4):1128–38. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/jbhi.2020.3011178.
Ren K, Chang L, Wan M, Gu G, Chen Q. An improved U-net based retinal vessel image segmentation method. Heliyon. 2022;8:e11187. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.heliyon.2022.e11187.
Zhang Y, He M, Chen Z, Hu K, Li X, Gao X. Bridge-Net: Context-involved U-net with patch-based loss weight mapping for retinal blood vessel segmentation. Expert Syst Appl. 2022;195:116526. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.eswa.2022.116526.
Wang B, Qiu S, He H. Dual Encoding U-Net for Retinal Vessel Segmentation. In: Medical Image Computing and Computer Assisted Intervention - MICCAI 2019. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-030-32239-7_10.
Zhang J, Zhang Y, Xu X. Pyramid U-Net for Retinal Vessel Segmentation. In: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/icassp39728.2021.9414164.
Zhu Y, Cao J, Yin H, Zhao J, Gao K. Seismic Data Reconstruction based on Attention U-net and Transfer Learning. J Appl Geophys. 2023;219:105241. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jappgeo.2023.105241.
Zhang S, Fu H, Yan Y, Zhang Y, Wu Q, Yang M, Tan M, Xu Y. Attention Guided Network for Retinal Image Segmentation. In: Lecture Notes in Computer Science. pp 797-805. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-030-32239-7_88.
Alvarado-Carrillo DE, Dalmau-Cedeno OS. Width Attention based Convolutional Neural Network for Retinal Vessel Segmentation. Expert Syst Appl. 2022;209:118313. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.eswa.2022.118313.
Wang H, Xu G, Pan X, Liu Z, Tang N, Lan R, Luo X. Attention-inception-based U-Net for retinal vessel segmentation with advanced residual. Comput Electr Eng. 2022;98:107670. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compeleceng.2021.107670.
Sun K, Chen Y, Chao Y, Geng J, Chen Y. A retinal vessel segmentation method based on improved U-Net model. Biomed Signal Process Control. 2023;82:104574. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.bspc.2023.104574.
Alsayat A, Elmezain M, Alanazi S, Alruily M, Mostafa AM, Said W. Multi-Layer Preprocessing and U-Net with Residual Attention Block for Retinal Blood Vessel Segmentation. Diagnostics. 2023;13:3364. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/diagnostics13213364.
Ding W, Sun Y, Huang J, Ju H, Zhang C, Yang G, Lin C-T. RCAR-UNet: Retinal vessel segmentation network algorithm via novel rough attention mechanism. Inf Sci. 2024;657:120007. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ins.2023.120007.
Preity N, Bhandari AK, Shahnawazuddin S. Soft Attention Mechanism Based Network to Extract Blood Vessels From Retinal Image Modality. IEEE Trans Artif Intell. 2024;5(7):3408–18. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/tai.2024.3351589.
Liu M, Wang Z, Li H, Wu P, Alsaadi FE, Zeng N. AA-WGAN: Attention augmented Wasserstein generative adversarial network with application to fundus retinal vessel segmentation. Comput Biol Med. 2023;158:106874. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.compbiomed.2023.106874.
Zhou Y, Chen Z, Shen H, Zheng X, Zhao R, Duan X. A refined equilibrium generative adversarial network for retinal vessel segmentation. Neurocomputing. 2021;437:118–30. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.neucom.2020.06.143.
Pandey AK, Singh SP, Chakraborty C. Residual attention UNet GAN Model for enhancing the intelligent agents in retinal image analysis. Service Oriented Comput Appl. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11761-024-00415-w.
Gu W, Xu Y. Retinal Vessel Segmentation via Adversarial Learning and Iterative Refinement. J Shanghai Jiaotong Univ Sci. 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s12204-022-2479-5.
Brown EE, Guy AA, Holroyd NA, Sweeney PW, Gourmet L, Coleman H, Walsh C, Markaki AE, Shipley R, Rajendram R, Walker-Samuel S. Physics-informed deep generative learning for quantitative assessment of the retina. Nat Commun. 2024;15. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41467-024-50911-y.
Wang S, Chen Y, Yi Z. A Multi-Scale Attention Fusion Network for Retinal Vessel Segmentation. Appl Sci. 2024;14:2955. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/app14072955.
Bakkouri I, Afdel K. Multi-scale CNN based on region proposals for efficient breast abnormality recognition. Multimed Tools Appl. 2018;78(10):12939–60. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11042-018-6267-z.
Paul A, Bhattacharya P, Maity SP. Histogram modification in adaptive bi-histogram equalization for contrast enhancement on digital images. Optik. 2022;259:168899. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ijleo.2022.168899.
Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J. Dual Path Networks. In: 31st Conference on Neural Information Processing Systems (NIPS 2017). https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.1707.01629.
Xie S, Girshick R, Dollar P, Tu Z, He K. Aggregated Residual Transformations for Deep Neural Networks. In: The 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doiorg.publicaciones.saludcastillayleon.es/10.1109/cvpr.2017.634.
Wong K, Dornberger R, Hanne T. An analysis of weight initialization methods in connection with different activation functions for feedforward neural networks. Evol Intell. 2022. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s12065-022-00795-y.
Tan M, Le Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In: The 36th International Conference on Machine Learning. https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.1905.11946.
Bakkouri I, Bakkouri S. 2MGAS-Net: Multi-level Multi-scale Gated Attentional Squeezed Network for Polyp Segmentation. SIViP. 2024;18(6–7):5377–86. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11760-024-03240-y.
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). https://doiorg.publicaciones.saludcastillayleon.es/10.1109/iccv48922.2021.00986.
Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv:1412.6980.
He K, Zhang X, Ren S, Sun J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In: 2015 IEEE International Conference on Computer Vision (ICCV). https://doiorg.publicaciones.saludcastillayleon.es/10.1109/iccv.2015.123.
Wei J, Wang S, Huang Q. F3Net: Fusion, Feedback and Focus for Salient Object Detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020;34(07):12321-12328. https://doiorg.publicaciones.saludcastillayleon.es/10.1609/aaai.v34i07.6916.
Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2020;18(2):203–11. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41592-020-01008-z.
Chen J, Mei J, Li X, Lu Y, Yu Q, Wei Q, Luo X, Xie Y, Adeli E, Wang Y, Lungren MP, Zhang S, Xing L, Lu L, Yuille A, Zhou Y. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med Image Anal. 2024;97:103280. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.media.2024.103280.
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M. Swin-unet: unet-like pure transformer for medical image segmentation. In: Lecture notes in computer science. 2023;205-18. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-031-25066-8_9.
Clinical trial number
Not applicable.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Author information
Authors and Affiliations
Contributions
I.B. wrote the manuscript, conducted the computational tests, and prepared the figures and tables, while S.B. helped edit the text. The authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Bakkouri, I., Bakkouri, S. UGS-M3F: unified gated swin transformer with multi-feature fully fusion for retinal blood vessel segmentation. BMC Med Imaging 25, 77 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12880-025-01616-1
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12880-025-01616-1