TransConv: a lightweight architecture based on transformers and convolutional neural networks for adenocarcinoma and Barrett's esophagus identification

Luis Antônio de Souza, André G. C. Pacheco, Alberto F. De Souza, Thiago Oliveira-Santos, Claudine Badue, Christoph Palm, João Paulo Papa

Published: 01 Jan 2025, Last Modified: 26 Oct 2025Neural Comput. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Barrett’s esophagus, also known as BE, is commonly associated with repeated exposure to stomach acid. If not treated properly, it may evolve into esophageal adenocarcinoma, aka esophageal cancer. This paper proposes TransConv, a hybrid architecture that benefits from features learned by pre-trained vision transformers (ViTs) and convolutional neural networks (CNNs), followed by a shallow neural network composed of three normalizations, ReLU activations, and fully connected layers, and a SoftMax head to distinguish between BE and esophageal cancer. TransConv is designed to be training-lightweight, and for the ViT and CNN backbone models, weights are kept frozen during training, i.e., the primary goal of TransConv is to learn the weights of the fully connected layer from both backbones only, avoiding the burden of updating their weights but still learning their final descriptions for the lightweight convolutional model. We report promising results with low computational training costs in two datasets, one public and another private. From our achievements, TransConv was able to deliver balanced accuracy results around 85% and 86% for each evaluated dataset, respectively, in a design that required only 50 epochs of model training, a very reduced number compared to state-of-the-art conducted studies in the same domain.

External IDs:dblp:journals/nca/SouzaPSOBPP25