Building a 32-URL economy microsite on top of a 754,000-row SQLite dataset

Building a 32-URL economy microsite on top of a 754,000-row SQLite dataset

在 754,000 行 SQLite 数据集之上构建一个 32 个 URL 的经济微型网站

We launched GDPIndex, a reference site for the twenty G20 economies. The numeric data sits in a 152 MB SQLite file (216 countries × 102 indicators × 754,809 rows of annual country-data). The interesting part of the build is not the dataset itself but how we exposed only a tiny, focused slice of it — and why we resisted the urge to publish thousands of pages. This post is about the WordPress side of that decision: how the site routes 32 URLs against a half-million-row store without going near WP_Query, and how we split structured numbers from country-specific editorial narrative so each page has a different skeleton.

我们推出了 GDPIndex,这是一个针对 G20 二十国集团经济体的参考网站。其数值数据存储在一个 152 MB 的 SQLite 文件中(包含 216 个国家 × 102 个指标 × 754,809 行年度国家数据)。构建过程中最有趣的部分并非数据集本身,而是我们如何仅展示其中极小且聚焦的部分,以及为什么我们抵制了发布数千个页面的冲动。本文将探讨这一决策背后的 WordPress 实现:网站如何在不使用 WP_Query 的情况下,针对五十万行数据存储路由 32 个 URL,以及我们如何将结构化数据与特定国家的编辑叙述分离开来,从而使每个页面拥有不同的骨架。

Why 32 URLs

为什么是 32 个 URL?

The same SQLite already powers a much larger reference site — HistorySaid — with about 600 indexed URLs across 216 countries and 9 economic themes. GDPIndex is a deliberate subset: 20 country pages at /{country}/economy/ (G20: 19 nations + EU), 9 ranking pages at /rankings/{indicator}/ covering the GDP-family series, and 3 static pages (about, methodology, privacy, terms). That cap is the point. Google now treats large auto-generated reference sites with suspicion, and we already had the per-country data — what was missing was editorial reason for each page to exist. Twenty G20 economies, each with a structural story that has been the subject of decades of literature, is a defensible scope.

同一个 SQLite 数据库已经驱动了一个更大的参考网站——HistorySaid,该网站涵盖 216 个国家和 9 个经济主题,拥有约 600 个索引 URL。GDPIndex 是一个经过深思熟虑的子集:20 个国家页面(/{country}/economy/,即 G20 的 19 个国家 + 欧盟)、9 个排名页面(/rankings/{indicator}/,涵盖 GDP 系列指标)以及 3 个静态页面(关于、方法论、隐私、条款)。设置上限正是关键所在。谷歌现在对大型自动生成的参考网站持怀疑态度,而我们已经拥有了各国的原始数据——缺失的是每个页面存在的编辑理由。二十个 G20 经济体,每一个都有其作为数十年文献研究对象的结构性故事,这是一个站得住脚的范围。

Virtual routing without WP_Query

不使用 WP_Query 的虚拟路由

WordPress’s rewrite system is great for posts, taxonomies, and the bits of the URL space that map onto database records. It is terrible for serving 30 URLs whose content is computed at request time from data in a sibling table. Every workaround we have seen — custom post types, ACF-driven templates, rewrite endpoints — ends up either creating 30 dummy posts to “host” the URLs or fighting the parse_request lifecycle. A cleaner approach: never let WordPress think about these URLs at all. Hook template_redirect at priority 0, parse $_SERVER['REQUEST_URI'] yourself, and short-circuit the response if the path matches one of your virtual routes.

WordPress 的重写系统非常适合文章、分类法以及映射到数据库记录的 URL 空间。但对于那些内容需要在请求时从关联表中计算得出的 30 个 URL 来说,它表现极差。我们见过的所有变通方法——自定义文章类型、ACF 驱动的模板、重写端点——最终要么是为了“托管”这些 URL 而创建 30 个虚拟文章,要么是在与 parse_request 生命周期作斗争。一种更简洁的方法是:根本不要让 WordPress 处理这些 URL。在优先级 0 处挂载 template_redirect,自行解析 $_SERVER['REQUEST_URI'],如果路径匹配你的虚拟路由之一,则直接短路响应。

(Code snippet omitted for brevity)

(代码片段略)

Returning null is critical: anything we don’t claim falls through to WordPress, so /wp-admin/, /wp-json/, the favicon, and robots.txt all keep working. We also intercept sitemap.xml, robots.txt, and llms.txt on parse_request (earlier than template_redirect), because WordPress 5.5+ ships a core sitemap at /wp-sitemap.xml and will 301 you there before your template hook runs. wp_sitemaps_enabled → false disables that core handler.

返回 null 至关重要:任何我们未声明的路径都会回落给 WordPress 处理,因此 /wp-admin/、/wp-json/、favicon 和 robots.txt 都能正常工作。我们还在 parse_request(比 template_redirect 更早)阶段拦截了 sitemap.xml、robots.txt 和 llms.txt,因为 WordPress 5.5+ 在 /wp-sitemap.xml 处内置了核心站点地图,如果不处理,它会在你的模板钩子运行前将你 301 重定向到那里。将 wp_sitemaps_enabled 设置为 false 即可禁用该核心处理器。

Two slugs per country

每个国家使用两个别名 (Slug)

The dataset and the URL space don’t agree on slugs. SQLite has russian-federation, turkiye, korea, european-union. The URLs we want are russia, turkey, south-korea, eu. The router holds a single canonical map. Every layer that touches a country goes through this map: the router resolves URL → DB slug, the data layer queries by DB slug, the page renderer flips back to the URL slug when emitting internal links, and the JSON-LD BreadcrumbList uses URL slugs end-to-end. Putting the map in one function and refusing to allow ad-hoc string substitution anywhere else is what keeps the site consistent.

数据集和 URL 空间在别名(slug)上并不一致。SQLite 中使用的是 russian-federationturkiyekoreaeuropean-union。而我们想要的 URL 是 russiaturkeysouth-koreaeu。路由器维护了一个单一的规范映射表。每一层涉及国家数据的逻辑都通过这个映射表进行转换:路由器将 URL 解析为数据库别名,数据层按数据库别名查询,页面渲染器在生成内部链接时转回 URL 别名,JSON-LD 面包屑导航则全程使用 URL 别名。将映射逻辑集中在一个函数中,并禁止在其他任何地方进行临时的字符串替换,是保持网站一致性的关键。

Two custom tables: structure plus narrative

两张自定义表:结构与叙述

The page template is shared across all 20 countries, but the content of each section is different per country. The “middle-income trap” section appears on Türkiye, China and Brazil; it does not appear on Germany. So we cannot store sections as a fixed schema — we store them as JSON. A single PHP page template iterates sections_json and renders each section with its own chart and table specs.

页面模板在所有 20 个国家之间共享,但每个部分的内容因国家而异。“中等收入陷阱”部分出现在土耳其、中国和巴西的页面上,但不会出现在德国页面上。因此,我们不能将这些部分存储为固定模式,而是将其存储为 JSON。单个 PHP 页面模板会遍历 sections_json,并根据各自的图表和表格规范渲染每个部分。