Abstract: Highlights•We evaluate MLLMs’ performance in assessing weld quality.•We introduce WeldPrompt, a strategy using Chain-of-Thought and in-context learning.•MLLMs perform better on online than real-world weld images, showing limited generalization.•WeldPrompt boosts recall in some contexts but trades off precision in different applications.•MLLM limitations in welding offer insights for future XAI research in manufacturing.
Loading