Measuring Curriculum Alignment across Topical Coverage, Competency, and Cognitive Depth: A Longitudinal Framework Applied to CS2013 and CS2023

衡量课程在主题覆盖、能力要求与认知深度方面的一致性：应用于 CS2013 和 CS2023 的纵向框架

Abstract: Undergraduate computer science is governed by international curricular guidelines revised about once a decade, yet programs lack a reliable, reproducible way to measure how completely they cover the current guidelines and how that coverage shifts when the guidelines are restructured. We address this with a human-in-the-loop pipeline that measures a program’s coverage of an external body of knowledge, applied longitudinally to one accredited BSc in Computer Science against Computer Science Curricula 2013 (CS2013) and 2023 (CS2023).

摘要： 本科计算机科学教育受大约每十年修订一次的国际课程指南约束，然而各高校缺乏一种可靠且可复现的方法，来衡量其课程对当前指南的覆盖程度，以及当指南结构调整时覆盖范围如何变化。我们通过一个“人在回路”（human-in-the-loop）的流程解决了这一问题，该流程用于衡量课程对外部知识体系的覆盖情况，并将其纵向应用于一个获得认证的计算机科学理学学士（BSc）项目，对比了《计算机科学课程 2013》（CS2013）和《2023》（CS2023）。

The pipeline represents the program and each guideline as structured corpora, generates candidate course-to-knowledge-unit matches by semantic retrieval, and confirms them through human judgment under an explicit coverage definition. Of seven benchmarked retrievers, a reciprocal-rank-fusion ensemble was strongest, and a reputed long-context model underperformed a small sentence model, so retriever choice must be measured. Both maps were validated by an independent second rater (Cohen’s kappa 0.64 for CS2023, 0.69 for CS2013).

该流程将课程和每项指南表示为结构化语料库，通过语义检索生成课程与知识单元的候选匹配项，并根据明确的覆盖定义通过人工判断进行确认。在七种基准检索器中，倒数排名融合（Reciprocal Rank Fusion）集成模型表现最强，而一个知名的长上下文模型表现反而不如小型句子模型，因此必须对检索器的选择进行评估。两份映射结果均由第二位独立评估员进行了验证（CS2023 的 Cohen’s kappa 系数为 0.64，CS2013 为 0.69）。

The program covers 49.7% of CS2023 and 50.9% of CS2013 knowledge units, near-constant across a decade. Extending the same retrieve-then-confirm design to competency articulation and cognitive depth shows that the program articulates the competency for ~88% of covered units under each guideline, yet delivers it at the recommended depth for 76% of present units under CS2023 against 95% under CS2013, a gap reflecting the newer guideline’s raised expectations, not the program.

该项目覆盖了 CS2023 中 49.7% 和 CS2013 中 50.9% 的知识单元，十年间保持近乎恒定。将相同的“检索-确认”设计扩展到能力表达和认知深度分析后发现，该项目在两项指南下均能阐述约 88% 已覆盖单元的能力要求；但在达到推荐认知深度方面，CS2023 下为 76%，而 CS2013 下为 95%。这一差距反映了新指南提高了期望值，而非项目本身水平下降。

The longitudinal comparison separates persistent structural gaps (parallel and distributed computing, foundations of programming languages, systems fundamentals), uncovered against both guidelines and ABET, from differences that reflect the standard’s evolution. The instrument is reusable and available from the authors on request.

纵向比较区分了持续存在的结构性缺口（如并行与分布式计算、编程语言基础、系统基础），这些内容在两项指南及 ABET 标准中均未被覆盖，并将其与反映标准演变的差异区分开来。该评估工具可重复使用，如有需要可向作者索取。