AuscMLLM: Bridging Classification and Reasoning in Heart Sound Analysis with a Multimodal Large Language Model
Abstract: This study introduces a multimodal large language model capable of not only accomplishing various heart sound tasks but also providing reasoning, marking an advancement in the field of medical diagnostics. The model’s innovation stems from a collaboration with experts to collect a novel dataset designed specifically for reasoning tasks, addressing the limitations of existing datasets that lacked this capability. Our model integrates multiple novel methodologies to enhance diagnostic accuracy, including the incorporation of knowledge from relevant textbooks through pre-training, the employment of an audio feature extractor optimized for heart sound-text alignment, and a logit adjustment loss tailored for large language model to mitigate the challenge of imbalanced data categories. This approach not only sets a new standard for heart sound analysis but also paves the way for more interpretable and comprehensive diagnostic models in healthcare.
External IDs:dblp:conf/icassp/ZhaoWZZSSZW025
Loading