It's Not Just X. It's Y
It’s Not Just X. It’s Y
不仅仅是 X,更是 Y
Against the Quantification of Integrity
反对诚信的量化
When the measure of language becomes its target, it ceases to be good language. 💡Nerd Rating: 1/5. I discuss the origins of certain linguistic tics in LLMs and what it means for writing, student assessment, and thinking. 当语言的衡量标准成为其追求的目标时,它就不再是好的语言了。💡极客评分:1/5。我将探讨大语言模型(LLM)中某些语言习惯的起源,以及这对写作、学生评估和思维意味着什么。
“It’s not x, it’s y.” Large Language Models gravitate toward this type of construction, called negative parallelism. It has its uses: it sets up a contrast. It’s useful, especially, for reframing assumptions: “You think it’s like that, but it’s really like this.” It’s all over social media, especially on LinkedIn, and the construction has sparked a backlash amid an ongoing war against automated language production. “不是 X,而是 Y。”大语言模型倾向于使用这种被称为“否定平行结构”的句式。它有其用途:可以建立对比。它特别适用于重构假设:“你以为是那样,但实际上是这样。”这种句式在社交媒体上随处可见,尤其是在 LinkedIn 上,而在针对自动化语言生成的持续战争中,这种结构也引发了强烈抵制。
If you use em-dashes – you might be a bot. If you describe things that delve, quietly, or genuinely (or create lists of three, like that one), you might be a bot. Recent overuse by language models has led many to declare it bad writing. I’m not so sure. Nobody called JFK a lazy writer when he said, “ask not what your country can do for you – ask what you can do for your country.” Negative parallelism is a rhetorical device, and any rhetorical device is only as lazy or inspired as what it contains. 如果你使用破折号——你可能就是个机器人。如果你描述事物时喜欢用“深入”、“悄悄地”或“真诚地”(或者像那样列出三个项目),你可能就是个机器人。最近语言模型对这些词汇的过度使用,导致许多人将其判定为糟糕的写作。我对此并不确定。当肯尼迪总统说“不要问你的国家能为你做什么,而要问你能为你的国家做什么”时,没人说他是个懒惰的写作者。否定平行结构是一种修辞手法,而任何修辞手法的优劣,取决于其所承载的内容是否平庸或富有灵感。
Automated Language Production
自动化语言生成
Now, we have AI detectors that claim to protect you from the witch hunt by looking for these patterns. You take your own writing and you run it through Grammarly, which will analyze word patterns that AI detectors might flag. Then it offers ideas for how to change them, which a) gives Grammarly the power to write for you and b) makes your writing lose any sense of rhythm or intent. 现在,我们有了各种 AI 检测器,它们声称通过寻找这些模式来保护你免受“猎巫行动”的伤害。你把你写的文章放进 Grammarly,它会分析那些可能被 AI 检测器标记的词汇模式。然后,它会提供修改建议,这不仅 a) 让 Grammarly 有权代你写作,而且 b) 使你的文字失去了任何节奏感或意图。
Grammarly’s review of this section has flagged 27 examples of text I should change to avoid the accusation that I am a machine. For example, Grammarly identified the above phrase – “automated language production” – as 11 times more likely to be AI. It suggests that a human would be “against mechanized language synthesis” instead. The simple two-word combo, “align with” was flagged as 43x more likely to be AI-generated. Real humans say “corresponds.” Grammarly 对本节内容的审查标记了 27 处我应该修改的文本,以避免被指控为机器。例如,Grammarly 认为上述短语“automated language production”(自动化语言生成)被判定为 AI 生成的可能性高出 11 倍。它建议人类应该使用“against mechanized language synthesis”(反对机械化语言合成)。简单的双词组合“align with”(与……一致)被标记为 AI 生成的可能性高出 43 倍。真正的人类会说“corresponds”(对应)。
These are small suggestions that add up until the result resembles nothing I chose. The human voice replaced by a machine trying to sound human. As a result, I just paid Pangram – another AI-detection company – $20 to verify that a recently submitted journal article wasn’t AI-generated before submission. It wasn’t, and I knew it wasn’t. It agreed. That’s what I paid for: not to learn whether I wrote it, but to be told it wouldn’t flag me. Because if Pangram’s AI system found me guilty, that’s the end of my career. That’s literally extortion. 这些微小的建议累积起来,最终的结果变得面目全非,不再是我想要表达的样子。人类的声音被一台试图模仿人类的机器所取代。因此,我刚刚支付了 20 美元给另一家 AI 检测公司 Pangram,以验证我最近提交的一篇期刊文章是否为 AI 生成。它确实不是,我也知道它不是。检测结果也证实了这一点。这就是我付费的原因:不是为了确认文章是否由我所写,而是为了确保它不会被标记。因为如果 Pangram 的 AI 系统判定我有罪,我的职业生涯就结束了。这简直就是勒索。
And if it had flagged it, then what? It would give me a score (four valuations: high, very likely, somewhat likely, human) to assign my integrity a category. In the ecosystem we’re all building, I’d have to use Grammarly to rephrase everything: using a machine to write for me to prove that I didn’t use a different machine to write for me. 如果它真的标记了,那又怎样?它会给我一个分数(四个评估等级:高、很有可能、有点可能、人类),将我的诚信归入一个类别。在我们共同构建的这个生态系统中,我不得不使用 Grammarly 来重写一切:利用一台机器代我写作,以证明我没有使用另一台机器代我写作。
A Culture Hostile to Reason
一个敌视理性的文化
Our instinct in making sense of these machines is to examine the training data. That training data is no longer “just the Web.” The web is the raw meat, but this sausage is heavily pre- and post-processed. Post-training optimizes the model for whatever it’s designed to do. This includes techniques such as RLHF (reinforcement learning with human feedback) and RLVR (reinforcement learning through verified rewards). 我们理解这些机器的本能是检查训练数据。这些训练数据不再仅仅是“互联网”。互联网是生肉,但这种“香肠”经过了大量的预处理和后处理。训练后的优化旨在使模型完成其设计目标。这包括诸如 RLHF(基于人类反馈的强化学习)和 RLVR(通过验证奖励进行的强化学习)等技术。
RLHF has humans rank replies, then the system emphasizes those kinds of replies. RLVR is weirder, and I suspect it’s why we see “It’s not X, it’s Y” so often. Dismissing negative parallelism as lazy gets in the way of understanding why it’s showing up everywhere. This type of language is such a powerful framework for thinking that we mistake it for a model’s capacity for thought. We credit computation for the work that’s done by language. RLHF 让人们对回复进行排名,然后系统会强化这些类型的回复。RLVR 更奇怪,我怀疑这就是我们如此频繁地看到“不是 X,而是 Y”的原因。将否定平行结构斥为懒惰,会阻碍我们理解它为何无处不在。这种语言是一种如此强大的思维框架,以至于我们误将其视为模型具备了思考能力。我们将语言所完成的工作归功于计算。
Weird Dogs
奇怪的狗
RLVR isn’t a structure that watches for words and triggers some sub-process. Instead, you train a model, like you would any model. When that model is done, it predicts tokens. Lots of people are still in denial about this. Token prediction involves producing a list of candidates based on their mathematical distribution in the training data, ranking them by their likelihood given the previous words in the prompt or sequence. RLVR 并不是一种监视词汇并触发子进程的结构。相反,你像训练任何模型一样训练它。当模型训练完成后,它会预测 Token(标记)。许多人仍然对此持否定态度。Token 预测涉及根据训练数据中的数学分布生成候选列表,并根据提示词或序列中前文的概率对它们进行排序。
RLVR intervenes by having the model solve math problems by writing their way to a solution, reproducing the language we would use when thinking out loud about how to solve it. When the model arrives at the correct answer, the language it used most often to get there is then emphasized in the finished model. This is (partly) what the industry calls reasoning. RLVR 的介入方式是让模型通过书写解题过程来解决数学问题,重现我们在大声思考解题思路时所使用的语言。当模型得出正确答案时,它在得出答案过程中最常使用的语言就会在最终模型中得到强化。这就是业界(部分)所称的“推理”。
What day was it that we saw that weird dog?
我们是哪天看到那只奇怪的狗的?
So, think of it like this: You are sitting with a friend. Your phones are dead. Your friend asks: what day was it that we saw that weird dog? You start by saying, “It was Thursday.” Your friend says: “No, it wasn’t Thursday, because Thursday I was out of town.” So you say that’s right, so it must have been Wednesday, because Wednesday was your mutual friend’s birthday, and you both went to the party, and you saw the dog on the way to the party. 试着这样想:你和朋友坐在一起。手机都没电了。朋友问:我们是哪天看到那只奇怪的狗的?你开始说:“是周四。”朋友说:“不,不是周四,因为周四我出城了。”于是你说对,那一定是周三,因为周三是你们共同朋友的生日,你们都去了派对,而且是在去派对的路上看到的狗。
Your friend says: “That’s right, except, Wednesday was our friend’s birthday but the party was on Friday. So we must have seen the dog on Friday.” The two of you have articulated your way to the answer, a verifiable one: you could pop on your phones and check your photos and see that yes, the weird dog picture was taken on Friday. 朋友说:“没错,但是,周三是我们朋友的生日,但派对是在周五。所以我们一定是在周五看到的狗。”你们两人通过口头表达找到了答案,这是一个可验证的答案:你们可以打开手机查看照片,确认那张奇怪的狗的照片确实是在周五拍的。
In dehumanizing terms, your gut instinct (“it’s Thursday”) is what a model might spit out at first guess, and that’s where models used to stop. But you didn’t. Your friend countered: “It wasn’t [Thursday], it was [Wednesday].” There are more words, which narrow the window of possible answers, and then you arrive, through “its-not-x-its-y-ing,” at the correct date. The two of you had actual memories and visceral experiences to work with. Language was the vessel through which these experiences were communicated and conflicts were resolved. 用非人的术语来说,你的直觉(“是周四”)就是模型在第一次猜测时可能会吐出的内容,而这正是模型过去停止的地方。但你没有停下。你的朋友反驳道:“不是[周四],是[周三]。”更多的词汇缩小了可能答案的范围,然后通过这种“不是 X,而是 Y”的过程,你们得出了正确的日期。你们两人拥有真实的记忆和切身的体验。语言是交流这些体验并解决冲突的载体。