Making your own programming language is easier than you think (but also harder)

Making your own programming language is easier than you think (but also harder)

打造属于自己的编程语言比你想象中更容易(但也更难)

2026 May 6 2026年5月6日

In mid-December last year I started making my own programming language. It’s waaay far from any production quality yet (though I did manage to write a working 1k LOC Monte-Carlo path tracer in it), but the project is on pause right now, so I figured it’s a good time to write something about it. 去年12月中旬,我开始着手打造自己的编程语言。虽然它距离生产环境的质量还差得远(不过我确实用它写出了一个1000行代码的蒙特卡洛路径追踪器),但目前该项目处于暂停状态,所以我认为现在是写点东西记录一下的好时机。

Disclaimer #1: I’m not a professional PL designer or compiler implementor. Even though I do feel like I know what I’m talking about for the most part of this post, I might still end up talking some nonsense. 免责声明 #1:我不是专业的编程语言(PL)设计师或编译器实现者。尽管我觉得在这篇文章的大部分内容中我清楚自己在说什么,但我仍有可能说出一些外行话。

Disclaimer #2: it’s not another C/C++/Rust/etc killer, and I doubt it’ll ever be actually used to any noticeable extent. I’m just having fun and talking about me having fun. 免责声明 #2:它不是什么 C/C++/Rust 等语言的“杀手”,我也怀疑它是否会被广泛使用。我只是在自娱自乐,并分享我的乐趣。

Disclaimer #3: if you have some strong opinions about programming languages, please, keep in mind that I’m not forcing you to use this language, and that it’s a bit rude to be telling random people on the internet what they should do. If, on the other hand, you have constructive feedback and suggestions, I’m all ears! 免责声明 #3:如果你对编程语言有强烈的个人见解,请记住,我并没有强迫你使用这门语言,而且在互联网上对陌生人指手画脚是不礼貌的。反之,如果你有建设性的反馈和建议,我洗耳恭听!

Contents

目录

Introduction

引言

Why now? I mean, most programmers dream of their own perfect programming language. I’ve been programming for about 17 years, so why did I decide to make a language at this specific point in time? It just so happens that 3 different things converged in my mind. 为什么是现在?我的意思是,大多数程序员都梦想拥有属于自己的完美编程语言。我已经编程大约17年了,为什么偏偏在这个时间点决定开发一门语言呢?这恰好是因为三件事在我的脑海中交汇了。

Of course, I always wanted to make my own programming language as well. I made a bunch of silly interpreters for some esoteric languages in the past (FALSE is probably my favourite), as well as interpreters for various flavours of lambda calculus, but that doesn’t scratch the itch of making a real language, one that is at least somewhat production-oriented and doesn’t feel like a toy. 当然,我一直都想打造自己的编程语言。过去我曾为一些深奥的语言写过一堆简单的解释器(FALSE 可能是我的最爱),也写过各种 Lambda 演算的解释器,但这并不能满足我打造一门真正语言的渴望——一门至少在某种程度上面向生产环境、而不只是像个玩具的语言。

As you might probably know, I’m working on a big game which is highly susceptible to modding, and I’ve been thinking about how to approach modding since the start of this project. I’ve analyzed a ton of options, and it just so happens that making a custom programming language is actually one of the simplest solutions. 如你所知,我正在开发一款大型游戏,它非常适合进行 Mod 开发,从项目开始之初我就一直在思考如何处理 Mod 问题。我分析了无数种方案,结果发现,开发一门自定义编程语言实际上是最简单的解决方案之一。

In December 2025, the amazing Matt Godbolt introduced the Advent of Compiler Optimisations, where he’d post some fun examples of what C++ compilers are capable of, walking through the generated assembly. Apart from this being an excellent series, it really made me want to mess with some assembly once again. Of course, making a non-toy programming language is a gargantuan endeavour, but somehow after looking at assembly for a few weeks, I felt like it shouldn’t be that bad. 2025年12月,了不起的 Matt Godbolt 推出了“编译器优化降临”(Advent of Compiler Optimisations)系列,他通过分析生成的汇编代码,展示了 C++ 编译器所能实现的各种有趣示例。除了这是一个出色的系列教程外,它确实让我再次产生了钻研汇编语言的冲动。当然,打造一门非玩具性质的编程语言是一项艰巨的任务,但不知为何,在研究了几周汇编代码后,我觉得这似乎并没有那么糟糕。

Modding

Mod 开发

I want to elaborate on the modding thing. Essentially, I have 3 main concerns with respect to modding: 我想详细谈谈 Mod 开发的问题。本质上,关于 Mod 开发我有三个主要顾虑:

  1. My game is highly simulation-heavy. There are hundreds of thousands of entities simulated via a custom ECS engine. Ideally, I’d want the modding language to be able to just take a bunch of component pointers and iterate over them like you would in a C for loop.

  2. 我的游戏模拟负载极高。通过自定义的 ECS 引擎模拟了数十万个实体。理想情况下,我希望 Mod 语言能够直接获取一组组件指针,并像在 C 语言的 for 循环中那样对它们进行遍历。

  3. It’s hard to control what’s going on in mods, so some level of protection for the player would be nice to have. Ideally, I’d want the modding language to be easily sandboxable – i.e. I want to be able to disable all IO and similar stuff with a single switch.

  4. 很难控制 Mod 内部的行为,因此为玩家提供一定程度的保护会更好。理想情况下,我希望 Mod 语言能够轻松实现沙盒化——即我希望能够通过一个开关禁用所有的 IO 操作及类似功能。

  5. I want modding to be as easy as it can be. Ideally you’d throw a script in a certain folder and there you have it, a mod can be used.

  6. 我希望 Mod 开发尽可能简单。理想情况下,你只需要把脚本扔进某个文件夹,Mod 就可以直接使用了。

It was somewhat a surprise to me that there doesn’t seem to exist a solution satisfying these two requirements. Let’s go over common possibilities. 令我惊讶的是,似乎不存在一种能同时满足这些要求的解决方案。让我们来看看常见的可能性。

Lua (or any other JIT-compiled scripting language for that matter). That’s a standard choice, but it turns out that it’s really hard to sandbox it. Apparently you need to prepend any untrusted Lua code with some kind of prelude that explicitly deletes all known standard library functions that can be used for IO and such. There are even lists of these functions online in the forms of github gists. Even if this probably does work, it doesn’t sound like a reliable solution to me. Furthermore, Lua is a high-level dynamically-typed language that doesn’t know anything about C pointers. Bridging ECS entity iteration into it will either force per-entity native $\leftrightarrow$ Lua $\leftrightarrow$ native jumps with nonzero overhead, or constructing a Lua array from the native entities, and then deconstructing it back. Either way, this doesn’t sound good. Not to mention that standard Lua and LuaJIT have diverged some versions ago, which might make it extremely confusing both for modders and myself. Lua(或任何其他 JIT 编译的脚本语言)。 这是一个标准选择,但事实证明要对其进行沙盒化非常困难。显然,你需要为任何不受信任的 Lua 代码预置一段前导代码,显式删除所有已知的可用于 IO 等操作的标准库函数。网上甚至有以 GitHub Gist 形式存在的此类函数列表。即使这可能有效,但在我看来这并不是一个可靠的解决方案。此外,Lua 是一门高级动态类型语言,它对 C 指针一无所知。将 ECS 实体遍历桥接到 Lua 中,要么会强制产生每个实体在原生代码与 Lua 之间跳转的开销,要么需要从原生实体构建一个 Lua 数组,然后再将其解构回来。无论哪种方式,听起来都不太好。更不用说标准 Lua 和 LuaJIT 在几个版本前就已经分道扬镳了,这可能会让 Mod 开发者和我自己都感到极其困惑。

C++ There’s always the option to make mods “natively”. All the iteration problems are gone, but distributing mods becomes a nightmare. If they’d be distributed in binary, I’d have to provide some sort of a dev environment for all platforms, and a centralized storage for binary artifacts. If they would otherwise be distributed as source code, I’d have to bundle a C++ compiler with the game, which are known to be heavy and slow (a basic LLVM installation takes about 10-20 times more disk space than my current version of the game). Oh, and sandboxing becomes impossible. If you’re loading a native DLL which declares and uses int open();, you’re doomed – there’s basically no way to prevent it from accessing the filesystem, network, etc. And, – that goes without saying, – even though I personally do enjoy writing C++, I’d rather not force the modders to do that. All this applies to a bunch of other languages like Rust, by the way. C++ 总是可以选择以“原生”方式制作 Mod。所有的遍历问题都解决了,但分发 Mod 却成了噩梦。如果以二进制形式分发,我就必须为所有平台提供某种开发环境,并建立一个二进制制品的集中存储库。如果以源代码形式分发,我就必须在游戏中捆绑一个 C++ 编译器,众所周知,编译器又大又慢(一个基础的 LLVM 安装包占用的磁盘空间大约是我当前游戏版本的 10 到 20 倍)。哦,而且沙盒化变得不可能。如果你加载了一个声明并使用了 int open(); 的原生 DLL,你就完了——基本上没有办法阻止它访问文件系统、网络等。而且,不用说,虽然我个人确实喜欢写 C++,但我不想强迫 Mod 开发者也这样做。顺便提一下,所有这些问题同样适用于 Rust 等其他语言。

Please note that while I do put modding as one of the goals for the language, I’m still very much unsure whether I’m going to actually use it this way, and I don’t want to over-specialize the language to this use case. As I’ve said, I’m mostly messing around and having fun. 请注意,虽然我确实将 Mod 开发作为该语言的目标之一,但我仍非常不确定是否真的会以这种方式使用它,我也不想让这门语言过度针对这一用例。正如我所说,我主要是为了折腾和寻找乐趣。

Design goals

设计目标

Ok, so what do I want from my programming language? Quite a lot, actually: 好吧,那么我希望我的编程语言具备什么特性呢?实际上要求还挺多:

  • Seamless C interop – so that bridging between native game code and modding code would be as simple as a function call

  • 无缝的 C 语言互操作性——这样原生游戏代码与 Mod 代码之间的桥接就像函数调用一样简单。

  • Low level – which is mostly a consequence of having to handle raw arrays of entities

  • 底层化——这主要是因为需要处理原始实体数组的必然结果。

  • Practical and…

  • 实用性,以及……