Appreciating EXIF

Appreciating EXIF / 浅谈 EXIF

I recently was writing some code to apply a mask to an image input. The mask had no Exif metadata, but the image did, so I had to adjust for the orientation of the image by reading it from Exif. This isn’t hard with libraries, and I knew the gist of the problem: images can have Exif, Exif is optional, phones and cameras use it for orientation, and you need to account for that when processing pixels directly. But I did not have a clear mental model of how Exif actually is represented in files. I was curious. I had questions: Where exactly is that orientation value stored? When should I rotate pixels instead of preserving a tag? What else might be hiding in the metadata? When is it typically stripped? So this is a little random walk guide to Exif.

最近我在编写一段代码,用于给图像输入应用遮罩。遮罩本身没有 Exif 元数据,但图像有,所以我必须通过读取 Exif 来调整图像的方向。使用现成的库来做这件事并不难,我也了解问题的核心:图像可以包含 Exif,Exif 是可选的,手机和相机用它来记录方向,而在直接处理像素时,你必须考虑到这一点。但我对 Exif 在文件中究竟是如何呈现的并没有清晰的认知模型。我很好奇,并产生了一些疑问:那个方向值到底存储在哪里?什么时候应该旋转像素而不是保留标签?元数据中还隐藏着什么?它通常在什么时候被剥离?所以,这是一篇关于 Exif 的随笔指南。

What is Exif? / 什么是 Exif?

Exif is short for Exchangeable Image File Format. The current standard comes from CIPA, which lists it as “Exchangeable image file format for digital still cameras: Exif Version 3.1.” The Library of Congress has a good preservation-oriented summary, too. You’ll often see EXIF in all caps in camera docs, file-format chunk names, and old forum posts, but Exif seems to be the normal spelling in the standard itself. It is a metadata format that came out of the digital camera world in 1995, back when the problem was something like: this camera produced a JPEG, but where do we put the timestamp, shutter speed, aperture, focal length, thumbnail, and the fact that the camera was sideways? The answer was Exif.

Exif 是“可交换图像文件格式”(Exchangeable Image File Format)的缩写。目前的标准来自 CIPA,其名称为“数码相机可交换图像文件格式:Exif 3.1 版本”。美国国会图书馆也有一份很好的、侧重于保存的摘要。你经常会在相机文档、文件格式块名称和旧论坛帖子中看到全大写的 EXIF,但 Exif 似乎才是标准本身中使用的规范拼写。它是一种诞生于 1995 年数码相机时代的元数据格式,当时面临的问题是:相机生成了一张 JPEG,但我们该把时间戳、快门速度、光圈、焦距、缩略图以及相机是侧着拍摄的这些信息放在哪里呢?答案就是 Exif。

Most people run into this data through images from phones and cameras. It is also closely related to TIFF, because the actual Exif payload is a TIFF-shaped data structure living inside another file. Newer formats can carry Exif too, but each format gives it a different house. It’s optional. An image can have none. A camera image probably has some, but a processed image may have had it stripped. A synthetic image can have fake Exif because metadata is just data someone wrote into the file.

大多数人是通过手机和相机拍摄的图像接触到这些数据的。它与 TIFF 格式关系密切,因为实际的 Exif 有效载荷是一个存在于另一个文件中的 TIFF 结构数据。较新的格式也可以携带 Exif,但每种格式都为它提供了不同的“住所”。它是可选的,图像可以完全没有它。相机拍摄的图像通常会有一些,但经过处理的图像可能已经被剥离了元数据。合成图像也可以拥有伪造的 Exif,因为元数据仅仅是某人写入文件中的数据而已。

Where it lives / 它存在于哪里

For JPEG, Exif usually lives near the beginning of the file in an APP1 marker segment. A JPEG starts with two bytes: FF D8. That is the start-of-image marker. After that, a JPEG is a series of marker segments. Each segment starts with FF and a marker byte. APP1 is: FF E1. If that APP1 segment contains Exif, its payload starts with: 45 78 69 66 00 00 or, as text: Exif\0\0.

对于 JPEG 而言,Exif 通常位于文件开头附近的 APP1 标记段中。JPEG 以两个字节开头:FF D8,这是图像开始标记。在此之后,JPEG 是一系列标记段。每个段都以 FF 和一个标记字节开头。APP1 是:FF E1。如果该 APP1 段包含 Exif,其有效载荷以 45 78 69 66 00 00 开头,或者以文本形式表示为:Exif\0\0

Then comes the TIFF-based part. It starts with a byte order marker: II (Intel, little-endian) or MM (Motorola, big-endian). Then the TIFF magic number, 42, then an offset to the first Image File Directory, usually called IFD0. An IFD is a list of entries. Each entry has a tag id, a type, a count, and either a value or an offset to the value. Exif orientation is tag 0x0112. It is usually in IFD0. Its value is a small integer from 1 to 8. That is the whole trick at a very high level.

接下来是基于 TIFF 的部分。它以字节顺序标记开头:II(Intel,小端序)或 MM(Motorola,大端序)。然后是 TIFF 魔数 42,接着是一个指向第一个图像文件目录(通常称为 IFD0)的偏移量。IFD 是一个条目列表。每个条目都有一个标签 ID、类型、计数,以及一个值或指向该值的偏移量。Exif 方向是标签 0x0112,通常位于 IFD0 中。它的值是一个 1 到 8 之间的小整数。这就是从宏观层面来看的全部奥秘。

A tool looking for Exif in a JPEG: walks the JPEG markers to find APP1, checks for Exif\0\0, reads the TIFF header, follows the IFD entries, and looks for the tags it cares about. So the “where is Exif?” depends on the file. In JPEG, it is usually APP1. In WebP, it is an EXIF chunk. In HEIC, it is inside the HEIF box structure.

一个在 JPEG 中查找 Exif 的工具会:遍历 JPEG 标记以找到 APP1,检查是否存在 Exif\0\0,读取 TIFF 头,跟随 IFD 条目,并查找它关心的标签。因此,“Exif 在哪里?”取决于文件类型。在 JPEG 中,它通常是 APP1;在 WebP 中,它是一个 EXIF 数据块;在 HEIC 中,它位于 HEIF 盒结构内部。

A boring standard that aged well / 一个历久弥新的枯燥标准

I have a soft spot for simple standards that just keep working. Exif is not clean in the way you might design something from scratch today. It has TIFF internals. It has manufacturer MakerNotes. It has duplicate concepts across Exif, XMP, IPTC, ICC profiles, C2PA, and container metadata. The orientation tag feels simple until you try to explain values 5 and 7. It has continued to solve a real problem, though: pixels are not enough. A camera needs somewhere to put the circumstances of the image, and it’s simpler for everyone if that data is bundled into the image rather than shipped around as an accompanying file.

我对那些简单且能持续发挥作用的标准情有独钟。Exif 并不像你今天从零开始设计的那样简洁。它有 TIFF 的内部结构,有制造商的 MakerNotes,并且在 Exif、XMP、IPTC、ICC 配置文件、C2PA 和容器元数据之间存在重复的概念。方向标签看起来很简单,直到你试图解释值 5 和 7 的含义。然而,它一直在解决一个实际问题:仅有像素是不够的。相机需要一个地方来存放图像的拍摄环境信息,而且将这些数据打包在图像中,比作为附属文件随处传输要简单得多。

I admire it because it grew out of its initial container. In JPEG, Exif usually sits in APP1. In newer file formats like HEIC, it lives somewhere else. But the same payload format still applies. A phone in 2026 can take a photo in a modern container and still carry metadata shaped by decisions from the digital camera era.

我很钦佩它,因为它超越了最初的容器限制。在 JPEG 中,Exif 通常位于 APP1;在 HEIC 等较新的文件格式中,它位于其他地方。但相同的有效载荷格式依然适用。2026 年的手机拍摄的照片可以存放在现代容器中,但依然携带由数码相机时代决策所塑造的元数据。

What Exif is used for / Exif 的用途

The common stuff is what you would expect from a camera: date and time, camera make and model, lens model, shutter speed, aperture, ISO, focal length, flash, GPS location, orientation, software, color-space hints, manufacturer-specific MakerNotes.

常见的内容正如你对相机的预期:日期和时间、相机品牌和型号、镜头型号、快门速度、光圈、ISO、焦距、闪光灯、GPS 位置、方向、软件、色彩空间提示以及制造商特定的 MakerNotes。

Thumbnails are a “maybe”: Exif can carry an embedded thumbnail, usually in IFD1. It commonly is a small embedded thumbnail. Larger previews are messier. Some are Exif thumbnails, some are MakerNotes, some are MPF data, and some live in container-specific metadata. This is not everything. It is the usual useful slice. Photo apps use this data to sort, search, display, group, and edit images. Websites and upload pipelines use it, sometimes accidentally, to rotate images correctly. Photographers use it to inspect how a shot was made. Asset-management systems use it alongside other metadata standards for rights, captions, credits, and workflow state.

缩略图属于“可能存在”:Exif 可以携带嵌入式缩略图,通常在 IFD1 中。它通常是一个小的嵌入式缩略图。更大的预览图则更混乱,有些是 Exif 缩略图,有些是 MakerNotes,有些是 MPF 数据,还有一些存在于容器特定的元数据中。这还不是全部,但这是通常有用的部分。照片应用使用这些数据来排序、搜索、显示、分组和编辑图像。网站和上传流水线有时会无意中使用它来正确旋转图像。摄影师用它来检查拍摄方式。资产管理系统将其与其他元数据标准结合使用,以处理版权、说明、署名和工作流状态。

Color is a good example of boundaries blurring. Exif has a ColorSpace tag, but full ICC color profiles are a separate kind of metadata. If an image changed size, rotated, lost its color, or started displaying differently after a pipeline step, metadata is one of the first places I would look, but I would not assume the answer is specifically Exif. And of course, metadata is just whatever was put in the file. A file can say it came from a camera it did not come from. Timestamp can be wrong, GPS can be fake, a string field can contain gobblygook.

颜色是边界模糊的一个很好的例子。Exif 有一个 ColorSpace 标签,但完整的 ICC 颜色配置文件是另一种独立的元数据。如果图像在经过流水线处理后改变了大小、旋转、丢失了颜色或显示异常,元数据是我首先会检查的地方之一,但我不会假设答案一定就是 Exif。当然,元数据只是被放入文件中的任何内容。一个文件可以声称它来自某台相机,但事实并非如此。时间戳可能是错的,GPS 可能是伪造的,字符串字段可能包含乱码。

Use exiftool first / 首先使用 exiftool

If you are doing anything technical with image metadata, start with exiftool. It is Perl and is old in the very good way. It has baked into it all sorts of knowledge about metadata weirdnesses that exist in real files. The basic command is: exiftool image.jpg

如果你要对图像元数据进行任何技术性操作,请从 exiftool 开始。它是用 Perl 编写的,以一种非常好的方式保持着“古老”。它内置了关于真实文件中存在的各种元数据怪癖的知识。基本命令是:exiftool image.jpg