Abstract: Highlights•We present the first lightweight Multimodal Large Language Model named FishDetectLLM for fish detection.•We develop a new instruction-based conversation dataset on fish detection for instruction tuning.•FishDetectLLM shows superior performance and robust generalization.