Solving the 3Blue1Brown String Probability Problem (Without AI)
Solving the 3Blue1Brown String Probability Problem (Without AI)
解决 3Blue1Brown 的绳子概率问题(不使用 AI)
Data Science: Solving the 3Blue1Brown String Probability Problem (Without AI) 数据科学:解决 3Blue1Brown 的绳子概率问题(不使用 AI)
Let’s practice data science thinking through a probability problem. 让我们通过一个概率问题来练习数据科学思维。
Why did I go through the trouble of solving a silly probability problem in my free time when I could’ve been doom scrolling? Because I’m trying to stay sharp in this unique time period where we can outsource most of our critical thinking to generative AI. If you are reading through articles on TDS, you and I probably share that goal. 为什么我要在空闲时间费力去解决一个无聊的概率问题,而不是刷手机虚度光阴?因为在这个特殊的时代,我们几乎可以将所有的批判性思维外包给生成式 AI,而我正努力保持敏锐。如果你正在阅读 TDS(Towards Data Science)上的文章,那么你我可能有着共同的目标。
This article will go through solving a fun probability puzzle that one of my favorite YouTubers (3Blue1Brown) put out recently. By the way, if you aren’t familiar with his channel, you need to check him out. He focuses on intuitive visuals and explanations that will make you wonder why math is taught any other way. 本文将带你解决我最喜欢的 YouTuber(3Blue1Brown)最近发布的一个有趣的概率谜题。顺便说一句,如果你还不熟悉他的频道,一定要去看看。他专注于直观的视觉效果和解释,会让你感叹为什么数学教学不能都采用这种方式。
The problem setup
问题设定
The short I linked below will give you the best introduction, but I’ll go over the set up briefly here as a supplement. Imagine we have a box with multiple strings in it. We randomly select the end of one string and then randomly select the end of another string. We then tie the ends together. 我下面链接的短视频会给你最好的介绍,但我会在这里简要回顾一下设定作为补充。想象一下,盒子里有几根绳子。我们随机选择一根绳子的一端,然后再随机选择另一根绳子的一端。接着,我们将这两端系在一起。
One of two things can happen: (1) the ends come from different strings and we now have one, longer string or (2) the second end is from the same string we selected initially and tying them together makes a loop. If we tie two separate strings together, we place the longer string back in the box. If we make a loop, we remove it from the box. This process of randomly selecting strings continues until there are no strings left in the box. 可能会发生两种情况之一:(1) 这两端来自不同的绳子,我们现在得到了一根更长的绳子;或者 (2) 第二端与我们最初选择的是同一根绳子,将它们系在一起就形成了一个环。如果我们系的是两根不同的绳子,就把这根更长的绳子放回盒子里。如果我们形成了一个环,就把它从盒子里拿出来。这个随机选择绳子的过程会一直持续,直到盒子里没有绳子为止。
The problem question is, how many loops do we expect this process to create? Or, in less precise, but more practical words – If we repeated this process many times, what is the average number of loops that would be created? 问题的核心是:我们预期这个过程会产生多少个环?或者用更通俗、更实际的话来说——如果我们重复这个过程很多次,平均会产生多少个环?
Key observations about the problem
关于问题的关键观察
Having a good grasp of a problem is always key to finding a good solution. Outside of simply understanding the mechanics covered in the last section, there are some key observations that we will need to understand. 对问题有良好的把握始终是找到优秀解决方案的关键。除了理解上一节提到的机制外,我们还需要理解一些关键观察。
Observation #1 Each round of random draws has two random components. The first random draw and the second. The first draw isn’t very important. The second draw is, because it determines if we make a loop or a longer string. 观察 #1 每一轮随机抽取都有两个随机成分:第一次抽取和第二次抽取。第一次抽取并不重要,第二次抽取才重要,因为它决定了我们是形成一个环还是形成一根更长的绳子。
Observation #2 Each round results in one fewer string in the box no matter what happens. If a loop is made, the string that made the loop is removed. If a longer string is made, two strings became one, reducing the count of strings in the box by 1. 观察 #2 无论发生什么,每一轮都会导致盒子里的绳子减少一根。如果形成了一个环,形成环的那根绳子就被移除了。如果形成了一根更长的绳子,两根绳子变成了一根,盒子里的绳子总数也减少了 1。
Observation #3 The number of draws is not a random variable. Each round removes a string from the box regardless of the outcome (observation #2). Each round has two draws, so, the number of rounds will be equal to the number of strings. For example, if we have 10 strings, we have 20 random draws in 10 rounds. Note that the last ’round’ has just one remaining string and always results in a loop. 观察 #3 抽取的次数不是随机变量。无论结果如何,每一轮都会从盒子里移除一根绳子(观察 #2)。每一轮包含两次抽取,因此,轮数将等于绳子的数量。例如,如果我们有 10 根绳子,那么在 10 轮中会有 20 次随机抽取。注意,最后一轮只剩下一根绳子,并且总是会形成一个环。
Observation #4 This observation builds on the other three and is the most important. From the perspective of counting loops, each round of random draws is independent of the prior rounds. This means that we can break the problem down into individual rounds rather than having to consider the entire sequence of rounds together. Note that if we were interested in metrics like the expected circumference of a circle, this observation would not be true. The length of strings (and therefore the circumference of loops) are dependent on the prior rounds. 观察 #4 这一观察建立在前三点之上,也是最重要的一点。从计算环的数量的角度来看,每一轮随机抽取都独立于之前的轮次。这意味着我们可以将问题分解为单独的轮次,而不必考虑整个轮次序列。注意,如果我们关心的是诸如圆的预期周长之类的指标,这一观察就不成立了。绳子的长度(进而环的周长)确实依赖于之前的轮次。
Okay, with these observations down, let’s get into how we can solve the problem! 好了,掌握了这些观察结果,让我们来看看如何解决这个问题!
The “brute force” solution
“暴力破解”方案
Almost all problems like these (and real-life problems) have a brute force-style solution. An approach that is like digging a hole for a swimming pool by hand. For this problem, we can make a probability tree and manually calculate the expected number of loops. 几乎所有这类问题(以及现实生活中的问题)都有暴力破解式的解决方案。这种方法就像徒手挖游泳池一样。对于这个问题,我们可以建立一个概率树,并手动计算环的预期数量。
This is a clunky, but fine solution for small numbers of strings. In the video, Grant specifically calls out 50 strings as a number to solve for (that would require a tree that has 250 leaves!). He did that to push his audience out of the brute force method into smarter solutions. Let’s see if we can come up with a smarter approach. 对于少量的绳子来说,这是一个笨拙但可行的方案。在视频中,Grant 特别提到了 50 根绳子的情况(那需要一棵拥有 2^50 个叶子节点的树!)。他这样做是为了引导观众跳出暴力破解法,去寻找更聪明的解决方案。让我们看看能否想出一个更聪明的方法。
Divide and conquer solution
分治法解决方案
By really thinking through the characteristics of the random process, we realized that each round of random draws is independent of the others (observation #4). Because of this property, we can calculate the expected value of single draws and then see if we can find a way to combine multiple single draws to solve the problem. 通过深入思考随机过程的特性,我们意识到每一轮随机抽取都是相互独立的(观察 #4)。由于这一特性,我们可以计算单次抽取的期望值,然后看看能否找到一种方法将多次单次抽取结合起来解决问题。
Expected number of loops from a single draw 单次抽取的预期环数
We have already realized that the first random draw isn’t very important (observation #1) – it is really all about the second draw. Let’s walk through a simple problem with 4 strings. We do our first random draw to get the first end (it doesn’t really matter which one we choose). Our second random draw can be any of the ends in the box, except the end that we selected for the first draw. 我们已经意识到第一次随机抽取并不重要(观察 #1)——关键在于第二次抽取。让我们通过一个 4 根绳子的简单问题来演示。我们进行第一次随机抽取以获得第一个端点(选择哪一个并不重要)。我们的第二次随机抽取可以是盒子里除我们第一次选中的那个端点之外的任何端点。
Imagine we have 4 strings in the box, which gives us 8 ends. After choosing the first end, we can’t choose it again, so we have 7 ends we can choose from. One of the 7 will result in a loop, the other ends will not create a loop. 想象一下盒子里有 4 根绳子,也就是 8 个端点。选定第一个端点后,我们不能再次选择它,所以剩下 7 个端点可供选择。这 7 个端点中,有一个会导致形成一个环,其他的则不会。
So, the probability of making a loop is 1/7 and the probability of not making a loop is 6/7 – this yields 1/7 expected loops (11/7 + 06/7). Let’s generalize this to a formula using the number of strings as an input. If S is the number of strings – the number of ends is 2S (two ends per string). After the first selection, we can choose from 2S-1 ends, only one of those ends result in a loop. So, the formula for expected loops is 1/(2S-1). 因此,形成一个环的概率是 1/7,不形成环的概率是 6/7——这得出的预期环数是 1/7 (11/7 + 06/7)。让我们将其推广为一个以绳子数量为输入的公式。如果 S 是绳子的数量,那么端点的总数就是 2S(每根绳子有两个端点)。在第一次选择后,我们可以在 2S-1 个端点中进行选择,其中只有一个端点会导致形成环。因此,预期环数的公式是 1/(2S-1)。