Measuring progress toward AGI: A cognitive framework
Measuring progress toward AGI: A cognitive framework
衡量迈向通用人工智能(AGI)的进展:一个认知框架
Artificial General Intelligence (AGI) has the potential to accelerate scientific discovery and help solve some of humanity’s most pressing problems. But it can be difficult to know how close we are to this key milestone, because there’s a lack of empirical tools for evaluating systems’ general intelligence. Tracking progress toward AGI will require a wide range of methods and approaches, and we believe cognitive science provides one important piece of the puzzle.
通用人工智能(AGI)具有加速科学发现并帮助解决人类一些最紧迫问题的潜力。但由于缺乏评估系统通用智能的实证工具,我们很难判断距离这一关键里程碑还有多远。追踪迈向 AGI 的进展需要多种方法和途径,我们相信认知科学是拼图中重要的一块。
That’s why today, we’re releasing a new paper, “Measuring Progress Toward AGI: A Cognitive Taxonomy,” that presents a scientific foundation for understanding the cognitive capabilities of AI systems. Alongside the paper, we are partnering with Kaggle to launch a hackathon, inviting the research community to help build the evaluations needed to put this framework into practice.
因此,今天我们发布了一篇新论文《衡量迈向 AGI 的进展:一种认知分类法》(Measuring Progress Toward AGI: A Cognitive Taxonomy),为理解人工智能系统的认知能力提供了科学基础。与此同时,我们正与 Kaggle 合作举办一场黑客马拉松,邀请研究界共同构建将该框架付诸实践所需的评估体系。
Deconstructing general intelligence
解构通用智能
Our framework draws on decades of research from psychology, neuroscience and cognitive science to develop a cognitive taxonomy. It identifies 10 key cognitive abilities that we hypothesize will be important for general intelligence in AI systems:
我们的框架借鉴了心理学、神经科学和认知科学领域数十年的研究成果,开发出了一套认知分类法。它确定了 10 项关键的认知能力,我们假设这些能力对于人工智能系统的通用智能至关重要:
- Perception: extracting and processing sensory information from the environment
- 感知: 从环境中提取和处理感官信息
- Generation: producing outputs such as text, speech and actions
- 生成: 产生文本、语音和动作等输出
- Attention: focusing cognitive resources on what matters
- 注意力: 将认知资源集中在重要事项上
- Learning: acquiring new knowledge through experience and instruction
- 学习: 通过经验和指令获取新知识
- Memory: storing and retrieving information over time
- 记忆: 随时间推移存储和检索信息
- Reasoning: drawing valid conclusions through logical inference
- 推理: 通过逻辑推论得出有效结论
- Metacognition: knowledge and monitoring of one’s own cognitive processes
- 元认知: 对自身认知过程的知识和监控
- Executive functions: planning, inhibition and cognitive flexibility
- 执行功能: 规划、抑制和认知灵活性
- Problem solving: finding effective solutions to domain-specific problems
- 问题解决: 为特定领域的问题寻找有效的解决方案
- Social cognition: processing and interpreting social information and responding appropriately in social situations
- 社会认知: 处理和解读社会信息,并在社交场合做出适当反应
To understand AI capabilities across these cognitive abilities, we propose a three-stage evaluation protocol that benchmarks system performance in relation to human capabilities:
- Evaluate AI systems across a broad suite of cognitive tasks covering each ability, using held-out test sets to prevent data contamination.
- Collect human baselines for the same tasks from a demographically representative sample of adults.
- Map each AI system’s performance relative to the distribution of human performance in each ability.
为了理解人工智能在这些认知能力上的表现,我们提出了一个三阶段评估协议,将系统性能与人类能力进行基准对比:
- 在涵盖每项能力的广泛认知任务中评估人工智能系统,并使用留存测试集(held-out test sets)以防止数据污染。
- 从具有人口统计学代表性的成年人样本中收集相同任务的人类基准数据。
- 将每个人工智能系统的表现与人类在各项能力上的表现分布进行映射对比。
Going from theory to practice
从理论走向实践
Defining these cognitive abilities is a crucial first step, but we need more than a framework to measure progress. To put this theory into practice, we are launching a new Kaggle hackathon — “Measuring progress toward AGI: Cognitive abilities”. The hackathon encourages the community to design evaluations for five cognitive abilities where the evaluation gap is the largest: learning, metacognition, attention, executive functions and social cognition.
定义这些认知能力是至关重要的第一步,但要衡量进展,我们需要的不仅仅是一个框架。为了将这一理论付诸实践,我们正在发起一场新的 Kaggle 黑客马拉松——“衡量迈向 AGI 的进展:认知能力”。此次黑客马拉松鼓励社区为评估缺口最大的五种认知能力设计评估方案:学习、元认知、注意力、执行功能和社会认知。
Participants can use Kaggle’s newly launched Community Benchmarks platform to build and test their evaluations against a lineup of frontier models.
参与者可以使用 Kaggle 新推出的“社区基准”(Community Benchmarks)平台,针对一系列前沿模型构建并测试他们的评估方案。
We are offering a total prize pool of $200,000: $10,000 awards for the top two submissions in each of the five tracks, and $25,000 grand prizes for the four absolute best overall submissions. Submissions are open March 17 through April 16, and we’ll announce the results June 1. Head over to the Kaggle website to start building.
我们提供了总计 20 万美元的奖金池:五个赛道中每个赛道的前两名将获得 1 万美元奖金,而四个最佳综合提交作品将获得 2.5 万美元的大奖。提交时间为 3 月 17 日至 4 月 16 日,我们将在 6 月 1 日公布结果。请前往 Kaggle 网站开始构建吧。