From Explicit Elements to Implicit Intent: A Predefined Library for Auditable Behavioral Inference

从显性要素到隐性意图：用于可审计行为推理的预定义库

Abstract: We present SemantiClean, a modular framework for extracting structured semantic signals from e-commerce session data and driving pluggable inference targets including purchase intent, customer segmentation, and product affinity through a shared element library.

摘要： 我们提出了 SemantiClean，这是一个模块化框架，旨在从电子商务会话数据中提取结构化语义信号，并通过共享要素库驱动包括购买意图、客户细分和产品关联在内的可插拔推理目标。

Unlike conventional end-to-end predictors that optimise solely for accuracy, SemantiClean prioritises auditability, structural governance, and sigma=0 reproducibility, explicitly trading marginal predictive gains for element-level transparency and defensible decision trails.

与仅针对准确性进行优化的传统端到端预测模型不同，SemantiClean 优先考虑可审计性、结构治理和 sigma=0 的可重复性，明确以牺牲微小的预测增益为代价，换取要素层面的透明度和可辩护的决策轨迹。

Built upon the Online Shoppers Purchasing Intention (OSPI) dataset, the framework organises twenty-four behavioural elements into a four-layer architecture (Functional, Interaction, Systemic, Contextual) and enforces signal quality through three anti-inflation mechanisms: RedundancyGroup contribution caps, TieredPenaltyCalculator bias penalties, and AdaptiveConstraintMode cold-start.

该框架基于在线购物者购买意图 (OSPI) 数据集构建，将二十四个行为要素组织为四层架构（功能层、交互层、系统层、上下文层），并通过三种反通胀机制强制执行信号质量：冗余组贡献上限 (RedundancyGroup contribution caps)、分层惩罚计算器偏差惩罚 (TieredPenaltyCalculator bias penalties) 以及自适应约束模式冷启动 (AdaptiveConstraintMode cold-start)。

This report introduces the LLM-Integrated Semantic Inference Engine, a fully implemented two-phase LLM-driven inference architecture that leverages complete element metadata at inference time. All quantitative results reported herein are produced by this engine.

本报告介绍了“LLM 集成语义推理引擎”，这是一种完全实现的、由大语言模型 (LLM) 驱动的两阶段推理架构，可在推理时利用完整的要素元数据。本文报告的所有定量结果均由该引擎生成。

Deterministic engine outputs remain fully reproducible (sigma=0); LLM-dependent results (E8, E10) are subject to controlled output variability under fixed provider/model/temperature settings. The gender inference target remains non-functional in the current implementation and is excluded from all quantitative results.

确定性引擎输出保持完全可重复 (sigma=0)；依赖 LLM 的结果 (E8, E10) 在固定的提供商/模型/温度设置下，其输出变异性处于受控状态。性别推理目标在当前实现中尚不可用，因此已从所有定量结果中排除。