What is random generation?

What is random generation?

What is random generation? Posted on 2026-05-10 :: 22 min read :: Tags: 🏷testing

什么是随机生成? 发布于 2026-05-10 :: 22 分钟阅读 :: 标签:🏷测试

I wanted to write this as a sequel to “What is a property?” where I would talk about how Property-Based Testing libraries generate random structures, don’t worry, I’ll still do that! But I realized that without talking about randomness in computers, the writing would be incomplete. So the first part of this article will go into a core problem that we consider “solved” in PBT, which is designing good random number generators, with the latter part talking about implementing complex random data generators on top of such RNGs.

我原本想把这篇文章作为《什么是属性?》(What is a property?) 的续篇,探讨基于属性的测试 (PBT) 库是如何生成随机结构的。别担心,我依然会写那部分!但我意识到,如果不先谈谈计算机中的随机性,文章将是不完整的。因此,本文的第一部分将深入探讨 PBT 中被视为“已解决”的核心问题,即如何设计优秀的随机数生成器 (RNG);后半部分则会讨论如何在此类 RNG 的基础上实现复杂的随机数据生成器。

Random Number Generation in Computers

计算机中的随机数生成

Randomness is a noisy topic. At a first approximation, it signifies uncertainty, or orderlessness, or more formally, high entropy. Of course, in computer programs, random generation is fake; we use PRNGs (pseudo-random number generators), high-entropy functions of which the relationship between the seed and the generated random number is as arbitrary as possible.

随机性是一个嘈杂的话题。初步来看,它意味着不确定性、无序性,或者更正式地说,高熵。当然,在计算机程序中,随机生成是伪造的;我们使用的是 PRNG(伪随机数生成器),这是一种高熵函数,其种子与生成的随机数之间的关系尽可能地随机。

A very simple example of a PRNG is a linear congruential generator (LCG): 一个非常简单的 PRNG 示例是线性同余生成器 (LCG):

function lcg(seed: number): () => number {
  const a = 1664525;
  const c = 1013904223;
  const m = 2 ** 32;
  let state = seed;
  return () => {
    state = (a * state + c) % m;
    return state;
  };
}

const rand = lcg(42);
rand(); // 1083814273
rand(); // 380859148
rand(); // 2338846736

The randomness of the output of an LCG depends on its parameter choices, for instance a = 1 and c = 1 results in a counter module m. Precursor of LCG (according to Wikipedia) is apparently a Lehmer random number generator where c = 0, m is either a prime, or a power of a prime number; the initial seed must be a coprime of m, and a must be “an element of high multiplicative order modulo m”.

LCG 输出的随机性取决于其参数的选择,例如 a = 1 和 c = 1 会导致模 m 的计数器。根据维基百科,LCG 的前身显然是 Lehmer 随机数生成器,其中 c = 0,m 是素数或素数的幂;初始种子必须与 m 互质,且 a 必须是“模 m 的高乘法阶元素”。

As I’ve said, I’m also learning about random number generation as I write this post, so I created a visualization. There are some presets, and you can also try other numbers if you would like.

正如我所说,我在撰写本文时也在学习随机数生成,所以我创建了一个可视化工具。里面有一些预设,如果你愿意,也可以尝试其他数字。

Visualizing LCG quality

可视化 LCG 质量

Pick a preset or tweak the parameters and watch what a, c, and m do to the output. The left panel plots consecutive triples (xₙ, xₙ₊₁, xₙ₊₂) (or pairs (xₙ, xₙ₊₁) if you switch the view) — if visible structure shows up there, the generator is leaking patterns the spectral test would catch. The right panel is a histogram over [0, 1], which tells you whether the output is at least uniform on the surface.

选择一个预设或调整参数,观察 a、c 和 m 对输出的影响。左侧面板绘制了连续的三元组 (xₙ, xₙ₊₁, xₙ₊₂)(如果切换视图则为二元组 (xₙ, xₙ₊₁))——如果那里出现了可见的结构,说明生成器正在泄露频谱测试可以捕捉到的模式。右侧面板是 [0, 1] 区间的直方图,它告诉你输出在表面上是否至少是均匀的。

There are a whole sequence of random number generators, there are new ones right up to 2023, it’s an active area of research of which we frequently use its results, many of the algorithms show up on the random generation libraries and standard libraries of many PLs. Of course, the discussion is incomplete with the famous lava lambs!

随机数生成器有一整个系列,直到 2023 年仍有新的生成器出现。这是一个活跃的研究领域,我们经常使用其研究成果,许多算法出现在各种编程语言的随机生成库和标准库中。当然,如果不提著名的“熔岩灯”(lava lambs),讨论就是不完整的!

Once we have an algorithm that can take an arbitrary seed and produce a high-entropy random number out of it, we want a high-entropy source of those seeds too. For Cloudflare, it’s apparently their pattern-absent lava lambs; for many of us, it’s the OS source of randomness /dev/urandom that uses the only real source of randomness for a computer, the physical world, to produce seeds.

一旦我们拥有了一个能够接收任意种子并产生高熵随机数的算法,我们也需要这些种子的高熵来源。对于 Cloudflare 来说,显然是他们那无规律的熔岩灯;对于我们许多人来说,则是操作系统的随机性来源 /dev/urandom,它利用计算机唯一真正的随机性来源——物理世界——来生成种子。

Armed with true randomness and well-designed algorithms, we can generate uniform random 64 bit numbers, so are we done? Well, the issue is, we really don’t wanna generate 64 bit numbers, do we? We want to generate booleans, floats, bounded numbers, collections, complex structures… We also don’t want uniform randomness per se, we wanna be able to bias the random generator if needed, perhaps to fit into certain other distributions; luckily, we can build all of this from our simple primitive 64 random bits.

有了真正的随机性和精心设计的算法,我们可以生成均匀的 64 位随机数,这样就大功告成了吗?嗯,问题在于,我们其实并不想只生成 64 位数字,对吧?我们想要生成布尔值、浮点数、有界数字、集合、复杂结构……我们也不一定非要均匀的随机性,如果需要,我们希望能够对随机生成器进行偏差处理,以适应某些特定的分布;幸运的是,我们可以利用简单的 64 位随机数原语构建所有这些功能。

Generating Floats

生成浮点数

Random floating-point numbers typically generated via generating a random float in the interval [0.0, 1.0), and scaling the number based on the range. Nima Badizadegan has a much better article on random fp generation, so I’ll just give the gist here, the following is from the Haskell random library:

随机浮点数通常是通过在 [0.0, 1.0) 区间内生成一个随机浮点数,并根据范围进行缩放来生成的。Nima Badizadegan 有一篇关于随机浮点数生成的优秀文章,所以我在这里只讲要点。以下代码来自 Haskell 的 random 库:

-- | Generates uniformly distributed 'Double' in the range [0, 1].
-- Numbers are generated by generating uniform 'Word64' and dividing
-- it by 2^64. It's used to implement 'UniformRange' instance for
-- 'Double'.
--
-- @since 1.2.0
uniformDouble01M :: forall g m. StatefulGen g m => g -> m Double
uniformDouble01M g = do
  w64 <- uniformWord64 g
  return $ fromIntegral w64 / m
  where m = fromIntegral (maxBound :: Word64) :: Double

Essentially, we generate a random 64 bit integer, divide it by the largest 64 bit integer possible in the domain of the floating point numbers: 本质上,我们生成一个 64 位随机整数,然后将其除以浮点数域中可能的最大 64 位整数:

function randomFloat(): number {
  return rand() / 2 ** 64;
}

If we have a different range, we can just scale up [0, 1) to [a, b): 如果我们有不同的范围,只需将 [0, 1) 缩放到 [a, b) 即可:

function randomFloatInRange(a: number, b: number): number {
  return a + (b - a) * randomFloat();
}

Generating Booleans

生成布尔值

Generating a uniform random boolean is easy, rand gives us 64 random bits, we can just check any one of them; 生成均匀的随机布尔值很简单,rand 给我们 64 个随机位,我们只需检查其中任意一位即可:

function randomBoolean(): boolean {
  return (rand() & 1) === 0;
}

but do you have the courage to generate biased booleans? That is where we can rely on the random float generation: 但你有勇气生成有偏差的布尔值吗?这时我们可以依赖随机浮点数生成:

function randomBiasedBoolean(p: number): boolean {
  return randomFloat() < p;
}

Generating Bounded Integers

生成有界整数

We can easily model randomInt(a, b) as a + randomInt(0, b - a), so we can focus on solving random generation for [0, a). A simple approximation is rand() % a, with a classic problem, bias. When a is not a power of 2, then there is a part of the interval that’ll be hit a little bit more because the bins (k % a) will not have the same number of elements, let’s assume we have 4 bit integers, and our bound is 3 as you can see in the widget. Here’s how the distribution follows:

我们可以轻松地将 randomInt(a, b) 建模为 a + randomInt(0, b - a),因此我们可以专注于解决 [0, a) 区间的随机生成问题。一个简单的近似方法是 rand() % a,但这有一个经典问题:偏差。当 a 不是 2 的幂时,区间的一部分会被更频繁地命中,因为桶 (k % a) 不会包含相同数量的元素。假设我们有 4 位整数,且边界为 3(如小部件中所示),分布情况如下:

0: [0, 3, 6, 9, 12, 15] 1: [1, 4, 7, 10, 13] 2: [2, 5, 8, 11, 14]

As you can see, we have 6/16… 如你所见,我们有 6/16……