Should you normalize RGB values by 255 or 256?
Should you normalize RGB values by 255 or 256?
你应该用 255 还是 256 来归一化 RGB 值?
30fps.net — Computer Graphics & Programming with Pekka Väänänen. Let’s say you’re writing an image processing program. The program takes in an image, converts it to floating point, does some processing and finally saves the modified pixels to disk as 8-bit colors. The question today concerns how exactly the integer-to-float conversion should be done. 30fps.net — Pekka Väänänen 的计算机图形学与编程。假设你正在编写一个图像处理程序。程序读取一张图像,将其转换为浮点数,进行一些处理,最后将修改后的像素以 8 位颜色保存到磁盘。今天的问题在于:整数到浮点数的转换究竟应该如何进行?
There are two approaches which, written in Python and NumPy, look like this: 有两种方法,用 Python 和 NumPy 编写如下:
Standard division by 255 标准除以 255 法
pixels = img / 255.0
result = process(pixels)
output = np.trunc(result * 255 + 0.5)
Alternative division by 256 替代除以 256 法
pixels = (img + 0.5) / 256.0
result = process(pixels)
output = np.trunc(result * 256)
I assume that in both cases the output values are clamped before the final typecast: 我假设在这两种情况下,输出值在最终类型转换前都会进行截断(Clamp):
# Clamp and cast to 8 bits
output_8bit = output.clip(0, 255).astype(np.uint8)
The standard approach maps the integer 0 to 0.0 and 255 to 1.0. It works perfectly fine and is how GPUs do it. The alternative adds a 0.5 bias and divides by 256 instead, so the integer 0 gets mapped to 0.5/256=0.001953125. This is inconvenient because your image processing code can’t detect black pixels, for example, without knowing the above constant. As a consequence, you tie your logic to 8-bit inputs even if you compute in floating point. With the standard approach, you can always assume black is 0.0. But some programmers still feel a pull towards the alternative. What is going on? What do they see in it? 标准方法将整数 0 映射为 0.0,将 255 映射为 1.0。它工作得非常完美,这也是 GPU 的处理方式。而替代方法增加了一个 0.5 的偏移量并除以 256,因此整数 0 被映射为 0.5/256=0.001953125。这很不方便,因为你的图像处理代码如果不了解上述常数,就无法检测出黑色像素。结果就是,即使你在进行浮点运算,你的逻辑也被绑定在了 8 位输入上。使用标准方法,你可以始终假设黑色为 0.0。但仍有一些程序员倾向于使用替代方法。这是怎么回事?他们看中了什么?
The case against 255.0
反对 255.0 的理由
The standard approach does look quite strange when plotted on the number line. Below you can see an exaggerated version with 3-bit integers in the range [0..7] being mapped to [0,1]: 当在数轴上绘制时,标准方法看起来确实很奇怪。下面你可以看到一个夸张的版本,其中 [0..7] 范围内的 3 位整数被映射到 [0,1]:
On the X-axis we’ve got a number line and the locations of brown circles on it represent the decoded floating-point values. The numbers inside are the integer inputs. Each integer has arrows pointing to it; these show a range of floating-point values that round to it. I’ll call these ranges “bins” in the rest of this article. X 轴是一条数轴,上面的棕色圆圈位置代表解码后的浮点值。圆圈内的数字是整数输入。每个整数都有指向它的箭头;这些箭头显示了舍入到该整数的浮点值范围。在本文的其余部分,我将这些范围称为“箱子”(bins)。
Smaller bins at the extremes
极值处的箱子更小
The first issue really apparent in the diagram is how the standard formula’s extreme bins jut beyond the [0,1] range. Perhaps this visualization is unfair – both approaches clamp their output so the extreme bins could extend infinitely – but it clearly shows how “stretched” the standard range is. The stretched range is wider than the assumed operating range [0, 1] in image processing. This means that when converting floating-point values in the [0, 1] range back to integers, the extreme bins have effectively half the width of other bins. 图中第一个显而易见的问题是,标准公式的极值箱子超出了 [0,1] 的范围。也许这种可视化方式有失公允——两种方法都会对输出进行截断,因此极值箱子可以无限延伸——但它清楚地表明了标准范围是多么“被拉伸”。这个被拉伸的范围比图像处理中假设的操作范围 [0, 1] 更宽。这意味着当将 [0, 1] 范围内的浮点值转换回整数时,极值箱子的有效宽度只有其他箱子的一半。
As a consequence, it will be “harder” to output extreme values from your algorithm. For example, if you generate uniform [0,1] noise and round it using the standard formula, the values 0 and 255 will occur only half as frequently as other integers. We can verify this claim empirically by generating a million uniform random numbers, plotting them as a histogram, and observing that both the 0 and 255 bins are indeed only half as tall as other bins. 结果就是,你的算法将更“难”输出极值。例如,如果你生成均匀分布的 [0,1] 噪声并使用标准公式进行舍入,那么 0 和 255 出现的频率将只有其他整数的一半。我们可以通过生成一百万个均匀随机数、绘制直方图并观察到 0 和 255 的箱子高度确实只有其他箱子的一半,来从经验上验证这一说法。
Still, I’m having a hard time coming up with an example situation where the bias away from the extremes would prove problematic. Sure, the standard approach’s floats are spread over a wider range, but the original image will still round-trip convert losslessly (uint8 → float → uint8). Also, any result value just beyond 0.0 or 1.0 will still round to the right bin, evening out the output distribution. 尽管如此,我很难想出一个偏离极值的偏差会造成问题的具体场景。诚然,标准方法的浮点数分布在更宽的范围内,但原始图像仍然可以进行无损的往返转换(uint8 → float → uint8)。此外,任何略微超出 0.0 或 1.0 的结果值仍然会舍入到正确的箱子中,从而平衡输出分布。
Inexactness
不精确性
The second issue is that the standard approach’s floating-point values aren’t exact. For example 128/255.0 ≈ 0.501961 but 128/256.0 = 0.5. Due to this round-off error, the distances between floating-point values vary a tiny bit. But this isn’t a real problem since the error is truly tiny. A 32-bit floating-point number has a 23-bit fraction (“significand”). We are talking about round-off error in its least-significant bit; jitter with the magnitude less than 2^-23. Surely a relative error of 0.00001% is immaterial even in the most sophisticated image processing task. In this case, inexactness is an aesthetic question, not a technical one. 第二个问题是标准方法的浮点值并不精确。例如 128/255.0 ≈ 0.501961,但 128/256.0 = 0.5。由于这种舍入误差,浮点值之间的距离会有微小的变化。但这并不是一个真正的问题,因为误差确实非常小。32 位浮点数有 23 位小数部分(尾数)。我们讨论的是其最低有效位的舍入误差;抖动幅度小于 2^-23。显然,0.00001% 的相对误差即使在最复杂的图像处理任务中也是无关紧要的。在这种情况下,不精确性是一个审美问题,而不是技术问题。
Values not in between integers
不在整数中间的值
The alternative approach always places each floating-point value exactly in the middle of two integers. See how the vertical bars align in the number line diagram above. The halfway position can be thought of as a compromise; we don’t know what the original quantized value was exactly, and thus the average point between two successive integers is a good guess. I’m sure there are applications where this property is useful, even though I’m having a hard time coming up with examples myself. Well, at least dithering is more convenient, argues a 2015 blog post “Converting Color Depth” by Andrew Kesler. The reasoning goes that noise can be added without worrying about edge cases. In contrast, the standard formula’s awkward extremes require careful handling to keep the noise distribution consistent. 替代方法总是将每个浮点值精确地放置在两个整数的中间。看看上面数轴图中垂直条是如何对齐的。中间位置可以被视为一种折中;我们不知道原始量化值究竟是多少,因此两个连续整数之间的平均点是一个很好的猜测。我相信在某些应用中这个属性很有用,尽管我自己很难想出例子。不过,Andrew Kesler 在 2015 年的博客文章《转换颜色深度》中指出,至少抖动(dithering)处理起来更方便。其理由是,添加噪声时无需担心边缘情况。相比之下,标准公式尴尬的极值需要小心处理,以保持噪声分布的一致性。
Two types of quantizers
两种类型的量化器
So far the standard “divide by 255” formula still looks solid, or at least firm enough to still be worth it. Another way to think about the question is to zoom out a bit and see the two approaches as two different uniform scalar quantizers. If we check the Wikipedia page on quantization, we’ll quickly learn that there are… 到目前为止,标准的“除以 255”公式看起来仍然很稳健,或者至少足够稳健,值得使用。思考这个问题的另一种方法是退后一步,将这两种方法视为两种不同的均匀标量量化器。如果我们查看维基百科关于量化的页面,很快就会了解到……