Why There's a Tanker in Central Madrid
Why There’s a Tanker in Central Madrid
为什么马德里市中心会出现一艘油轮
We ingest about a million raw AIS messages every hour. Roughly four out of ten never make it to our database. That is not because they are all wrong. Most of those rejected messages aren’t position reports at all — they are vessel name broadcasts, safety messages, channel management commands, or interrogation requests that arrive on the same feed. 我们每小时接收约一百万条原始 AIS(船舶自动识别系统)消息。其中大约十分之四无法进入我们的数据库。这并不是因为它们全是错误的。大多数被拒绝的消息根本不是位置报告——它们是船名广播、安全消息、频道管理指令或查询请求,这些信息都通过同一数据流传输。
Once you strip those out, you are left with the actual position data. A 300-metre oil tanker reporting its position as central Madrid. A bulk carrier allegedly doing 90 knots — which would make it faster than most warships. A cargo vessel that teleports from the North Sea to the Sahara Desert between two consecutive reports, three seconds apart. 剔除这些干扰后,剩下的才是实际的位置数据。一艘 300 米长的油轮报告其位置在马德里市中心;一艘散货船据称以 90 节的速度航行——这比大多数军舰还要快;一艘货船在相隔三秒的两次连续报告之间,从北海“瞬移”到了撒哈拉沙漠。
The genuinely bad position data — invalid coordinates, impossible jumps, sentinel values from transponders that lost GPS — is a smaller fraction, probably in the single-digit percentages based on what we see and what academic literature reports. But even a few percent, at the volumes AIS produces, means tens of thousands of phantom ships drifting across continents every day. 真正错误的位置数据——无效坐标、不可能的跳跃、来自失去 GPS 信号的应答器的哨兵值——所占比例较小,根据我们的观察和学术文献报告,可能仅占个位数百分比。但即便只有几个百分点,在 AIS 产生的数据总量下,也意味着每天有数以万计的“幽灵船”在各大洲之间漂移。
This is AIS — the Automatic Identification System — the backbone of global vessel tracking. Every ship over 300 gross tonnes on an international voyage is required by SOLAS to broadcast its identity and location over VHF radio, every few seconds, around the clock. Around 400,000 vessels do this simultaneously, generating over 300 million messages per day. It is one of the largest real-time geospatial data streams on the planet. Nobody warns you about this part. 这就是 AIS(船舶自动识别系统),它是全球船舶追踪的基石。根据《国际海上人命安全公约》(SOLAS),每一艘进行国际航行的 300 总吨以上的船舶,都必须全天候每隔几秒钟通过甚高频(VHF)无线电广播其身份和位置。大约 40 万艘船舶同时执行此操作,每天产生超过 3 亿条消息。这是地球上最大的实时地理空间数据流之一。没人会提前警告你这些情况。
We are VesselAPI, a two-person company in Málaga, Spain. We built a REST API that takes this raw radio data and turns it into something you can actually use. What follows is the story of how we learned, the hard way, that maritime data wants to lie to you — and the filtering we built to catch it. 我们是 VesselAPI,一家位于西班牙马拉加的双人公司。我们构建了一个 REST API,将这些原始无线电数据转化为真正可用的信息。接下来要讲述的是我们如何通过惨痛的教训认识到海事数据会“欺骗”你,以及我们为了捕捉这些错误而构建的过滤机制。
What “Not Available” Looks Like at 161.975 MHz
在 161.975 MHz 频率上,“不可用”是什么样子的
The AIS specification — ITU-R M.1371-5, if you want to look it up — was designed by people who understood that transponders would sometimes have no idea where they are. So they built in sentinel values: specific numbers that mean “I don’t know.” Latitude 91° North. Longitude 181° East. Speed 102.3 knots. Heading 511°. These are not real coordinates. They are the AIS equivalent of a shrug. AIS 规范(如果你想查阅的话,是 ITU-R M.1371-5)是由那些深知应答器有时会“迷失方向”的人设计的。因此,他们内置了哨兵值:即代表“我不知道”的特定数字。北纬 91 度、东经 181 度、航速 102.3 节、航向 511 度。这些都不是真实的坐标,它们相当于 AIS 系统在“耸肩”。
A transponder that has lost its GPS fix, or has just been powered on and hasn’t acquired satellites yet, is supposed to transmit these values. The problem is that plenty of systems downstream — tracking platforms, analytics tools, map renderers — don’t check for them. They plot the point. And suddenly you have a vessel at 91° latitude, which is one degree past the North Pole, in mathematical space that doesn’t physically exist. 当应答器失去 GPS 定位,或者刚开机尚未捕获卫星信号时,它就会发送这些数值。问题在于,许多下游系统——如追踪平台、分析工具、地图渲染器——并不会检查这些值。它们直接将点绘制出来,于是你突然看到一艘船位于北纬 91 度,即北极点外一度的位置,处于物理上不存在的数学空间中。
We filter these out. Latitude over 90, longitude over 180, SOG at 102.3, heading at 511 — gone before they touch the database. The same goes for (0, 0) — Null Island, a fictional place in the Gulf of Guinea that is the most popular port on Earth if you believe unfiltered GPS data. 我们会过滤掉这些数据。纬度超过 90、经度超过 180、对地航速(SOG)为 102.3、航向为 511——在它们进入数据库之前就会被清除。对于 (0, 0) 坐标也是如此——这是“空岛”(Null Island),一个位于几内亚湾的虚构地点,如果你相信未经筛选的 GPS 数据,它就是地球上最繁忙的港口。
The Pipeline
数据流水线
When a raw AIS message arrives, it passes through four stages before it becomes an API response. We did not plan four stages. We started with coordinate bounds checking and kept adding layers as new classes of garbage revealed themselves. 当一条原始 AIS 消息到达时,它会经过四个阶段的处理,最终成为 API 响应。我们最初并没有计划四个阶段,而是从坐标边界检查开始,随着新类型的垃圾数据不断出现,我们不断增加过滤层。
Message Type Filtering
消息类型过滤
AIS has 27 message types. Types 1, 2, and 3 are Class A position reports — the bread and butter, broadcast every 2 seconds to 3 minutes depending on speed and navigational status. A container ship doing 20 knots and changing course reports every 2 seconds. The same ship at anchor drops to every 3 minutes. Types 18 and 19 are Class B position reports from smaller vessels. AIS 有 27 种消息类型。类型 1、2 和 3 是 A 类位置报告——这是最核心的数据,根据航速和航行状态,每 2 秒到 3 分钟广播一次。一艘以 20 节速度航行并正在转向的集装箱船每 2 秒报告一次;同一艘船在锚泊时,报告频率会降至每 3 分钟一次。类型 18 和 19 是来自小型船舶的 B 类位置报告。
The other 22 message types — binary data, safety broadcasts, channel management, interrogation requests — are not positions. We were surprised how often non-position messages leaked into position processing. A Type 5 message (static and voyage data — ship name, dimensions, destination) has no coordinates but arrives on the same feed. Our first week in production, we had phantom entries with zeroed-out positions because we weren’t filtering on message type. 其余 22 种消息类型——二进制数据、安全广播、频道管理、查询请求——都不是位置信息。令我们惊讶的是,非位置消息竟如此频繁地混入位置处理流程中。类型 5 消息(静态和航次数据——船名、尺寸、目的地)不包含坐标,但通过同一数据流传输。在投入生产的第一周,由于没有过滤消息类型,我们数据库中出现了大量位置为零的“幽灵”条目。
switch cache.MessageType {
case string(aisstream.POSITION_REPORT), string(aisstream.STANDARD_CLASS_B_POSITION_REPORT), string(aisstream.EXTENDED_CLASS_B_POSITION_REPORT):
// valid position type
default:
return nil
}
Three lines. They fixed a category of bad data that had cost us two days of debugging. 三行代码,解决了困扰我们两天调试工作的错误数据类别。
MMSI Validation
MMSI 验证
Every AIS transponder has a Maritime Mobile Service Identity — a 9-digit number that encodes what kind of entity is broadcasting. Ship stations use MMSIs in the range 100,000,000 to 799,999,999, where the first three digits (the MID — Maritime Identification Digits) roughly indicate the flag state’s region: 2xx for Europe, 3xx for the Americas, 4xx for Asia, and so on. 每个 AIS 应答器都有一个海上移动业务标识(MMSI)——一个 9 位数字,编码了广播实体的类型。船舶电台使用的 MMSI 范围在 100,000,000 到 799,999,999 之间,其中前三位数字(MID,海上识别码)大致指示了船旗国的区域:2xx 代表欧洲,3xx 代表美洲,4xx 代表亚洲,依此类推。
Outside that range, you get coast stations (prefixed 00), SAR aircraft (prefixed 111), man-overboard devices (972), and EPIRBs (974). All of these broadcast AIS, and none of them are ships. Then there are MMSIs that shouldn’t exist at all — misconfigured transponders with default factory values, test transmissions. We reject anything outside the vessel range: 在此范围之外,还有海岸电台(前缀 00)、搜救飞机(前缀 111)、落水人员定位装置(972)和应急无线电示位标(EPIRB,974)。所有这些设备都会广播 AIS 信号,但它们都不是船舶。此外,还有一些根本不该存在的 MMSI——例如配置错误的应答器使用出厂默认值,或是测试传输。我们会拒绝任何超出船舶范围的数据:
if cache.MMSI < 100000000 || cache.MMSI > 799999999 {
return nil
}
There is also a subtler problem: MMSI sharing. When multiple vessels use the same MMSI — whether through misconfiguration or deliberate sanctions evasion — a single identity appears to teleport across oceans. Your tracking system shows one ship doing 4,000 knots because it is actually two ships on opposite sides of the Indian Ocean, alternating transmissions. This is a documented tactic used by the dark fleet. Kpler identified 261 vessels that spoofed AIS before being sanctioned. An estimated 600 to 1,000 vessels — roughly 10% of the global large oil tanker fleet — operate this way. 还有一个更隐蔽的问题:MMSI 共享。当多艘船舶使用同一个 MMSI 时——无论是由于配置错误还是蓄意规避制裁——同一个身份看起来就像在海洋间“瞬移”。你的追踪系统显示一艘船以 4000 节的速度航行,因为实际上这是两艘位于印度洋两侧的船在交替发送信号。这是“暗黑船队”使用的一种有据可查的策略。Kpler 曾识别出 261 艘在受制裁前伪造 AIS 信号的船舶。据估计,有 600 到 1000 艘船舶——约占全球大型油轮船队的 10%——以这种方式运作。
Coordinate Validation
坐标验证
After message type and MMSI filtering, we validate the coordinates themselves: 在完成消息类型和 MMSI 过滤后,我们会验证坐标本身:
if cache.Latitude == 0 && cache.Longitude == 0 {
return nil
}
if cache.Latitude < -90
(Note: The original text ends abruptly here.) (注:原文在此处中断。)