I cut my AWS bill by 93% by ditching Fargate for a single Lightsail VM

我通过放弃 Fargate 转用单台 Lightsail 虚拟机，将 AWS 账单削减了 93%

TL;DR I built ToolMango, an AI tools directory, on AWS Fargate. The bill came back at $345/mo before traffic. I migrated to a single $12 Lightsail VM in an afternoon and cut costs by 93% while keeping the same Next.js + Postgres + Redis + BullMQ stack alive. Here’s exactly what I changed, what broke, and what I’d do differently. 简而言之，我用 AWS Fargate 构建了一个 AI 工具目录网站 ToolMango。在没有任何流量的情况下，每月的账单高达 345 美元。我花了一个下午的时间将其迁移到一台 12 美元的 Lightsail 虚拟机上，在保留原有 Next.js + Postgres + Redis + BullMQ 技术栈的同时，将成本降低了 93%。以下是我所做的具体更改、遇到的问题以及我未来会采取的不同做法。

What ToolMango is (so the cost numbers make sense)

ToolMango 是什么（以便理解这些成本数字）

ToolMango is an editorial directory of AI tools. It scores tools on an ROI Score (cost, time-to-value, output quality, free-tier generosity, category fit, reader engagement) and ranks them — before knowing whether the tool has an affiliate program. Tools we don’t earn from frequently outrank tools we do. ToolMango 是一个 AI 工具的编辑目录。它根据 ROI 分数（成本、价值实现时间、输出质量、免费层级慷慨度、类别契合度、读者参与度）对工具进行评分并排名——而且是在不知道该工具是否有联盟营销计划的情况下进行的。我们无法从中获利的工具，排名往往高于我们能获利的工具。

Tech stack: 技术栈：

Next.js 14 App Router
Postgres 16
Redis (BullMQ for the agent job queue)
Anthropic Claude Sonnet for editorial agents (research, SEO sweep, social drafts)
A worker process running 5 cron schedules
Pre-revenue. Brand new domain. ~106 tools indexed at the time of writing.
Next.js 14 App Router
Postgres 16
Redis（用于代理任务队列的 BullMQ）
用于编辑代理（研究、SEO 扫描、社交媒体草稿）的 Anthropic Claude Sonnet
一个运行 5 个定时任务的 worker 进程
尚未盈利。全新域名。撰写本文时已收录约 106 个工具。

The original Fargate setup

最初的 Fargate 设置

I started on AWS because I had CDK boilerplate from another project. The architecture was over-engineered for a directory site getting zero traffic: 我选择 AWS 是因为我手头有来自其他项目的 CDK 样板代码。对于一个零流量的目录网站来说，这种架构属于过度设计：

CloudFront → ALB → Fargate (web ×2 tasks, worker ×1)
↓ Aurora Serverless v2 (writer)
ElastiCache (Redis, t4g.small ×2)
NAT ×2 (multi-AZ)
VPC + interface endpoints
WAF (managed rule sets)

The CDK code is clean. It deploys with one command. It autoscales. It survives an AZ failure. It’s exactly what a series-A SaaS would run. It’s also $345/mo for zero users. CDK 代码很整洁。一键部署，支持自动扩缩容，能抵御可用区（AZ）故障。这完全是 A 轮融资 SaaS 公司会用的架构。但对于零用户的情况，它每月要花费 345 美元。

What was actually costing money

到底是什么在花钱

I broke it down with aws ce get-cost-and-usage and a few aws ecs describe-task-definition calls: 我通过 aws ce get-cost-and-usage 和几次 aws ecs describe-task-definition 调用进行了拆解：

Resource	$/mo
Aurora Serverless v2 (no auto-pause, 0.5 ACU min)	$86
Fargate ARM64 (3 tasks: 2× web at 1vCPU/2GB + 1× worker at 0.5/1GB)	$71
2× NAT Gateways (multi-AZ)	$65
VPC interface endpoints (Secrets Manager × 3 AZ + others)	$40
ALB + WAF	$34
CloudWatch + Container Insights	$15
Public IPv4 charges	$15
ElastiCache (cache.t4g.small ×2 nodes)	$11
Misc (CloudFront, Secrets, Route53, S3)	$8

The killer insight: about $87/mo of that bill is “infrastructure plumbing” — NAT, ALB, ElastiCache, VPC endpoints. None of it is doing real work for the application. It’s all there to support the architecture itself. That’s the floor on a Fargate setup. For a pre-revenue project, it’s nuts. 关键发现：账单中约 87 美元/月是“基础设施管道”费用——NAT、ALB、ElastiCache、VPC 终端节点。它们没有为应用程序做任何实际工作，仅仅是为了支撑架构本身而存在。这就是 Fargate 设置的成本底线。对于一个尚未盈利的项目来说，这太疯狂了。

Phase 1: Skeleton mode on AWS

第一阶段：AWS 上的“骨架模式”

Before migrating, I tried to make Fargate cheap. CDK changes I shipped: 在迁移之前，我尝试降低 Fargate 的成本。我提交的 CDK 修改包括：

// Aurora: enable auto-pause when idle
const cfnCluster = cluster.node.defaultChild as rds.CfnDBCluster;
cfnCluster.serverlessV2ScalingConfiguration = {
  minCapacity: 0, // was 0.5 — auto-pause after 5 min idle
  maxCapacity: 2, // was 4
  secondsUntilAutoPause: 300,
};

// Network: 1 NAT instead of 2
natGateways: 1, // was 2 (multi-AZ)

// Web: smaller, fewer tasks, autoscale up if needed
desiredCount: 1, // was 2
cpu: 512, // was 1024
memoryLimitMiB: 1024, // was 2048

// Worker on Fargate Spot
capacityProviderStrategies: [
  { capacityProvider: "FARGATE_SPOT", weight: 4 },
  { capacityProvider: "FARGATE", weight: 1 },
],

// Container Insights off
containerInsightsV2: ecs.ContainerInsights.DISABLED,

// Backup retention
backup: { retention: cdk.Duration.days(1) }, // was 14

// WAF: removed entirely (CloudFront has free Shield Standard)

Result: $345/mo → ~$140/mo. Better, but still ridiculous for a pre-revenue project. The reason it stopped at $140: NAT, ALB, ElastiCache, VPC endpoints, and Aurora storage all have hard floors. You can’t make Fargate genuinely cheap because the architecture itself isn’t designed for cheap. 结果：从 345 美元/月降至约 140 美元/月。虽然好了一些，但对于一个尚未盈利的项目来说仍然离谱。停留在 140 美元的原因是：NAT、ALB、ElastiCache、VPC 终端节点和 Aurora 存储都有硬性成本底线。你无法让 Fargate 真正便宜，因为这种架构本身就不是为低成本设计的。

Phase 2: The honest migration

第二阶段：彻底的迁移

Lightsail is AWS’s “give me a Linux VM and stop overthinking it” tier. $12/mo for 2 vCPU, 2GB RAM, 60GB SSD, 3TB transfer — and it includes a static IP and a firewall. The plan: run everything on one VM in Docker Compose. Lightsail 是 AWS 的“给我一台 Linux 虚拟机，别想太多”层级。每月 12 美元，包含 2 vCPU、2GB 内存、60GB SSD、3TB 流量，还附带静态 IP 和防火墙。计划是：在单台虚拟机上通过 Docker Compose 运行所有服务。

(Docker Compose configuration omitted for brevity) (此处省略 Docker Compose 配置以保持简洁)

For HTTPS termination: Caddy, which auto-issues Let’s Encrypt certs on first request. Configuration is one stanza: 关于 HTTPS 终止：使用 Caddy，它会在首次请求时自动签发 Let’s Encrypt 证书。配置只需一段：

toolmango.com, www.toolmango.com {
    reverse_proxy 127.0.0.1:3000
    encode gzip zstd
    header {
        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        X-Content-Type-Options "nosniff"
    }
}

Caddy reloads, Caddy gets the cert. Total setup time: 30 seconds. Caddy 重载，获取证书。总设置时间：30 秒。

Migrating Aurora data to local Postgres

将 Aurora 数据迁移到本地 Postgres

Aurora is in a private subnet (PRIVATE_ISOLATED), so I couldn’t pg_dump from outside. The workaround: spin up a one-off ECS Fargate task in the existing Web’s VPC that runs pg_dump and uploads to S3. Aurora 位于私有子网（PRIVATE_ISOLATED）中，所以我无法从外部执行 pg_dump。变通方法是：在现有的 Web VPC 中启动一个一次性的 ECS Fargate 任务，运行 pg_dump 并上传到 S3。

(Command details omitted) (此处省略命令细节)

On the Lightsail VM, pull from S3 (via a presigned URL since Lightsail VMs don’t have IAM roles by default), gunzip, and pipe into the local Postgres container: 在 Lightsail 虚拟机上，从 S3 拉取数据（通过预签名 URL，因为 Lightsail 虚拟机默认没有 IAM 角色），解压，并通过管道传输到本地 Postgres 容器中：

gunzip -c /tmp/dump.sql.gz | docker compose exec -T postgres psql -U tmadmin -d toolmango

64 published tools transferred cleanly. ~485KB of data total (it’s a directory site). 64 个已发布的工具顺利迁移。总数据量约 485KB（毕竟是个目录网站）。