Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission Equipment

Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission Equipment

基于轻量级多模态大模型的输电设备缺陷分级研究

Abstract: Defect grading of power transmission equipment (DGPTE) is crucial to the stability of electric energy transmission. Although existing machine learning methods exhibit strong capabilities in defect detection, they are plagued by difficulties in integrating expert experience and facing class imbalance in more refined defect grading field.

摘要: 输电设备缺陷分级(DGPTE)对于电力传输的稳定性至关重要。尽管现有的机器学习方法在缺陷检测方面表现出强大的能力,但在更精细的缺陷分级领域,它们在整合专家经验和应对类别不平衡问题上仍面临诸多困难。

To address this issue, this paper introduces a novel defect grading framework based on multimodal large language model (MLLM). Specifically, this approach maximizes the commercial MLLMs’ potential of DGPTE through in-context learning and obtains the state-of-the-art (SOTA) model.

为了解决这一问题,本文引入了一种基于多模态大语言模型(MLLM)的新型缺陷分级框架。具体而言,该方法通过上下文学习(In-context learning)最大化了商业多模态大模型在输电设备缺陷分级中的潜力,并获得了当前最优(SOTA)模型。

By sending a secondary request to this model, a small number of chain of thought-based question-answer pairs (Q&As) are generated, which effectively reduces the cost of manual annotation. In this way, these high-quality interpretable Q&As are used to train Qwen3-VL-8B via Low-Rank Adaption-based supervised fine-tuning (SFT).

通过向该模型发送二次请求,系统生成了少量基于思维链(Chain of Thought)的问答对(Q&As),这有效地降低了人工标注的成本。随后,这些高质量且具有可解释性的问答对被用于通过基于低秩自适应(LoRA)的监督微调(SFT)来训练 Qwen3-VL-8B 模型。

Experimental results on three DGPTE tasks demonstrate that fine-tuning only the language model layer yields the SOTA performance. Furthermore, multi-task joint fine-tuning verifies the feasibility of handling multiple grading tasks within only a single lightweight MLLM.

在三项输电设备缺陷分级任务上的实验结果表明,仅微调语言模型层即可达到当前最优性能。此外,多任务联合微调验证了在单一轻量级多模态大模型中处理多个分级任务的可行性。