CFCamo: A Counterfactual Detect-or-Abstain Framework for Camouflaged Object Detection

CFCamo：一种用于伪装目标检测的反事实“检测或弃权”框架

Abstract: Vision-language reinforcement learning has recently shown strong target-present localization for camouflaged object detection (COD). Yet localization is only one side of the decision: when the agent faces an ordinary image with no camouflaged target, will it still claim that a camouflaged object exists? Standard COD training and evaluation data are positive-only, so agents optimized under this setting can acquire an over-detect bias, a task-specific form of object hallucination that standard COD evaluation leaves unmeasured.

摘要： 视觉-语言强化学习最近在伪装目标检测（COD）中展现出了强大的目标定位能力。然而，定位只是决策的一方面：当智能体面对一张没有伪装目标的普通图像时，它是否仍会声称存在伪装目标？标准的 COD 训练和评估数据仅包含正样本，因此在这种设置下优化的智能体可能会产生“过度检测偏差”，这是一种特定于任务的物体幻觉，而标准的 COD 评估并未对其进行衡量。

To quantify this target-absent behavior, we construct Counterfactual COD (CF-COD), a paired benchmark that removes the camouflaged target from each held-out COD evaluation image while preserving a plausible background. CF-COD evaluates whether a model detects the target on the original image and abstains on the target-absent counterfactual, summarized by Pair Accuracy (PA).

为了量化这种目标缺失时的行为，我们构建了反事实 COD（CF-COD），这是一个配对基准测试，它从每个留出的 COD 评估图像中移除伪装目标，同时保留合理的背景。CF-COD 评估模型是否能在原始图像上检测到目标，并在目标缺失的反事实图像上选择“弃权”，这一指标总结为配对准确率（PA）。

We further introduce CFCamo, a paired counterfactual framework for COD with abstention. For training, CFCamo optimizes a Qwen3-VL-4B-Instruct agent with Counterfactual Sequence Policy Optimization (CSPO), which samples paired original-counterfactual rollouts and uses a Counterfactual Paired Reward (CPR) to couple original-image detection with counterfactual abstention.

我们进一步引入了 CFCamo，这是一个用于 COD 的配对反事实框架，支持弃权机制。在训练方面，CFCamo 使用反事实序列策略优化（CSPO）来优化 Qwen3-VL-4B-Instruct 智能体，该方法通过采样配对的“原始-反事实”序列，并利用反事实配对奖励（CPR）将原始图像的检测与反事实图像的弃权行为进行耦合。

On CAMO-test, CFCamo improves S_alpha by +3.7 pp over the prior RL-based COD baseline; across CF-COD, it reaches 80.0-90.8% PA. Ablations show that removing counterfactual coupling reduces PA to 1.4-5.2% despite strong target-present COD scores, showing that target-present evaluation alone does not characterize detect-or-abstain behavior. Overall, these results indicate that CFCamo improves COD agents by coupling target-present detection with target-absent abstention, rather than merely strengthening target-present localization. Code and data are available at this https URL.

在 CAMO-test 测试集上，CFCamo 的 S_alpha 指标比之前的强化学习 COD 基线提高了 3.7 个百分点；在 CF-COD 基准上，其 PA 达到了 80.0-90.8%。消融实验表明，尽管在目标存在的情况下 COD 得分很高，但移除反事实耦合会将 PA 降低至 1.4-5.2%，这说明仅靠目标存在时的评估无法表征“检测或弃权”的行为。总的来说，这些结果表明，CFCamo 通过将目标存在时的检测与目标缺失时的弃权相结合，从而改进了 COD 智能体，而不仅仅是增强了目标定位能力。代码和数据可在该链接获取。