GPT-2: Too Dangerous To Release (2019)

GPT-2：因“过于危险”而暂缓发布（2019）

GPT-2 is a direct scale-up of GPT-1, with more parameters and trained on more data. However, it was deemed too dangerous to release by OpenAI: Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper. OpenAI Blog – Better Language Models and Their Implications GPT-2 是 GPT-1 的直接放大版，拥有更多的参数并使用了更多的数据进行训练。然而，OpenAI 认为它过于危险而未予发布：“出于对该技术被恶意应用的担忧，我们决定不发布训练好的模型。作为负责任披露的一次实验，我们转而发布了一个规模小得多的模型供研究人员实验，并附带了一篇技术论文。”——摘自 OpenAI 博客《更好的语言模型及其影响》。

GPT-1 was released to the public without such serious concerns. Therefore, the above claim made the public wonder how powerful GPT-2 must be in generating texts that look like humans wrote. Moreover, what’s the difference between GPT-1 and GPT-2? GPT-1 发布时并未引起如此严重的担忧。因此，上述声明让公众不禁好奇，GPT-2 在生成类人文本方面究竟有多强大？此外，GPT-1 和 GPT-2 之间到底有什么区别？

1 The Difference: GPT-1 vs. GPT-2

1 区别：GPT-1 与 GPT-2

In the GPT-1 paper, they experimented with the model on zero-shot task transfer in that they used the pre-trained model with heuristic solutions to perform specific tasks. The experiment’s success suggests that without supervised fine-tuning, the language model already contains information required to perform specific tasks. All that knowledge is stored in network parameters (weights and biases). 在 GPT-1 的论文中，研究人员对该模型进行了零样本任务迁移实验，即使用预训练模型配合启发式方案来执行特定任务。实验的成功表明，无需监督微调，语言模型本身就已经包含了执行特定任务所需的信息。所有这些知识都存储在网络参数（权重和偏置）中。

In other words, more parameters should increase the capacity of the language model and make it more robust to those specific tasks. In this sense, fine-tuning simply adds the final touch to the model for a specific task, and therefore the main thing that makes GPT-1 great is the pre-training. So, pre-training such a model with more parameters should improve the model’s performance further. 换句话说，更多的参数应该能增加语言模型的容量，并使其在处理这些特定任务时更加稳健。从这个意义上讲，微调只是为特定任务对模型进行最后的润色，因此 GPT-1 之所以出色，主要归功于预训练。所以，用更多的参数对这样的模型进行预训练，应该能进一步提升模型的性能。

Hence, GPT-2 is a direct scale-up of GPT-1, with more parameters and trained on more data. As such, GPT-1 and GPT-2 are not different in terms of architecture. Both are based on the transformer’s decoder. However, their main difference is the number of parameters and the amount and variety of training texts that allows the neural network to acquire more language knowledge and understanding and absorb them into its parameters. 因此，GPT-2 是 GPT-1 的直接放大版，拥有更多的参数并使用了更多的数据进行训练。就架构而言，GPT-1 和 GPT-2 没有区别，两者都基于 Transformer 的解码器。然而，它们的主要区别在于参数数量以及训练文本的数量和多样性，这使得神经网络能够获取更多的语言知识和理解，并将其吸收进参数中。

The larger model of GPT-2 (that was not released in February 2019) has 1.5 billion parameters, 10 times more than GPT-1. They trained the model on 40GB of web texts and achieved state-of-the-art results on various language modeling, reading comprehension, question answering, and summarization benchmarks. GPT-2 的大型版本（2019 年 2 月未发布）拥有 15 亿个参数，是 GPT-1 的 10 倍。他们使用 40GB 的网络文本对该模型进行了训练，并在各种语言建模、阅读理解、问答和摘要基准测试中取得了最先进（SOTA）的成果。

2 GPT-2: 1.5B Release

2 GPT-2：15 亿参数版本发布

The GPT-2 paper explains that there are four configurations of GPT-2. Table 2 of the paper The biggest GPT-2 uses 1.5B parameters for 48 decoder blocks with d_model = 1600. Considering the original transformer used six decoder blocks with the embedding dimension (d_model) of 512, the big GPT-2 model is humongous. Successfully training such a huge model itself is a big achievement. GPT-2 论文解释了该模型有四种配置。论文中的表 2 显示，最大的 GPT-2 使用了 15 亿个参数，包含 48 个解码器块，且 d_model = 1600。考虑到最初的 Transformer 使用了 6 个解码器块，嵌入维度（d_model）为 512，这个大型 GPT-2 模型规模极其庞大。成功训练出这样一个巨大的模型本身就是一项重大成就。

Nine months after the initial announcement of GPT-2, OpenAI decided to release the big GPT-2 with 1.5B parameters along with code and model weights: We hope that this test case will be useful to developers of future powerful models, and we’re actively continuing the conversation with the AI community on responsible publication. … Our experience with GPT-2 over the past nine months has given us valuable insight into the challenges and opportunities for creating responsible publication norms in AI. OpenAI Blog – GPT-2: 1.5B Release – November 5, 2019 在 GPT-2 最初发布九个月后，OpenAI 决定发布拥有 15 亿参数的大型 GPT-2 模型，并附带代码和模型权重：“我们希望这个测试案例能对未来强大模型的开发者有所帮助，我们也在积极与 AI 社区就负责任的发布进行对话……过去九个月使用 GPT-2 的经验，让我们对在 AI 领域建立负责任的发布规范所面临的挑战和机遇有了宝贵的见解。”——摘自 OpenAI 博客《GPT-2：15 亿参数版本发布》（2019 年 11 月 5 日）。

They summarized their findings from the nine months: Humans find GPT-2 outputs convincing. GPT-2 can be fine-tuned for misuse. Detection is challenging (detection rates of ~95% for detecting 1.5B GPT-2-generated text by RoBERTa). We’ve seen no strong evidence of misuse so far. We need standards for studying bias. All these points are valid, and OpenAI did a great job identifying potential risks, especially misuse and biases, at an early stage. 他们总结了这九个月的发现：人类认为 GPT-2 的输出具有说服力；GPT-2 可以通过微调被滥用；检测具有挑战性（RoBERTa 对 15 亿参数 GPT-2 生成文本的检测率约为 95%）；目前尚未发现滥用的确凿证据；我们需要研究偏见的标准。所有这些观点都是有效的，OpenAI 在早期识别潜在风险（尤其是滥用和偏见）方面做得非常出色。

3 GPT-2 vs. ChatGPT

3 GPT-2 与 ChatGPT

Today (December 2022), we’ve already seen how well ChatGPT performs. So, GPT-2 does not seem so harmful. I can see that they applied what they learned into ChatGPT to prevent misuses, for example, by not impersonating people. However, many other misuses, like students making ChatGPT do their home, are harder to prevent. These problems will likely persist and become widespread as researchers improve their AI capability. Could teachers use a detection model to find out if students have cheated? It’s getting harder. 今天（2022 年 12 月），我们已经见识到了 ChatGPT 的出色表现。因此，GPT-2 似乎并没有那么危险。我可以看到他们将所学经验应用到了 ChatGPT 中以防止滥用，例如禁止冒充他人。然而，许多其他滥用行为，比如学生让 ChatGPT 代写作业，则更难预防。随着研究人员不断提升 AI 能力，这些问题可能会持续存在并变得普遍。老师们能用检测模型来发现学生是否作弊吗？这正变得越来越难。

4 References

4 参考资料

GPT-1: Generative Pre-Trained Transformer (2018)
GPT-2: Better Language Models and Their Implications paper, code
OpenAI ChatGPT: Optimizing Language Models for Dialogue OpenAI
GPT-1：生成式预训练 Transformer (2018)
GPT-2：《更好的语言模型及其影响》论文、代码
OpenAI ChatGPT：优化对话语言模型 (OpenAI)