TokenScope: Token-Level Explainability and Interpretability for Code-Oriented Tasks in Large Language Models

TokenScope：大语言模型代码生成任务的词元级可解释性与分析工具

Abstract: Understanding how Large Language Models (LLMs) make token-level decisions during code generation remains a major challenge for both researchers and practitioners. While recent tools provide insights into model internals or generation outcomes, they often lack decoding-time signals, fine-grained uncertainty measures, and interactive mechanisms for exploring alternative generation paths.

摘要： 理解大语言模型（LLM）在代码生成过程中如何做出词元（Token）级决策，对于研究人员和从业者而言仍然是一项重大挑战。尽管现有的工具能够提供模型内部机制或生成结果的洞察，但它们往往缺乏解码时的信号、细粒度的不确定性度量，以及用于探索替代生成路径的交互机制。

We present TokenScope, an interactive interpretability and analysis tool for decoder-based LLMs that exposes token-level metrics, attention patterns, and structural information during generation. TokenScope supports interactive token replacement, counterfactual branching, and code-aware aggregation via abstract syntax trees. By unifying decoding-time signals with structural program analysis, TokenScope enables systematic investigation of LLM behaviour during code generation.

我们推出了 TokenScope，这是一款专为基于解码器（Decoder-based）的大语言模型设计的交互式可解释性与分析工具。它能够在生成过程中展示词元级指标、注意力模式以及结构化信息。TokenScope 支持交互式词元替换、反事实分支探索，以及通过抽象语法树（AST）进行的感知代码聚合。通过将解码时信号与程序结构分析相结合，TokenScope 能够对大语言模型在代码生成过程中的行为进行系统性的研究。

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

学科分类： 计算与语言 (cs.CL)；人工智能 (cs.AI)；软件工程 (cs.SE)

Cite as: arXiv:2607.01235 [cs.CL]

引用格式： arXiv:2607.01235 [cs.CL]