I Built an LLM Gateway That Extends Claude Pro/Max Users with Azure AI Foundry, Amazon Bedrock, Local Models

我构建了一个 LLM 网关，通过 Azure AI Foundry、Amazon Bedrock 和本地模型扩展了 Claude Pro/Max 的使用体验

AI coding tools have gotten very good. But the infrastructure behind them is still weirdly inefficient. Most tools assume one provider, one lane, one billing path. That means the same expensive model or subscription ends up handling everything: reading files, summarizing logs, quick repo questions, multi-file refactors, architecture planning, long debugging sessions. That is the wrong abstraction. A coding workflow is not one type of problem. So it should not be forced through one type of model path. That idea is what pushed me to build Lynkr. AI 编程工具已经变得非常出色，但其背后的基础设施却依然效率低下。大多数工具都预设了单一提供商、单一通道和单一计费路径。这意味着同一个昂贵的模型或订阅最终要处理所有任务：读取文件、总结日志、快速回答代码库问题、多文件重构、架构规划以及长时间的调试会话。这是一种错误的抽象。编程工作流并非单一类型的问题，因此不应被强行限制在单一的模型路径中。正是这一想法促使我构建了 Lynkr。

Lynkr is an open-source LLM gateway for AI coding tools that lets me combine: Claude Pro/Max subscription access, Azure AI Foundry-hosted models, Amazon Bedrock-hosted models, and local/free models like Ollama behind one routing layer. Lynkr 是一个面向 AI 编程工具的开源 LLM 网关，它允许我将 Claude Pro/Max 订阅访问权限、Azure AI Foundry 托管模型、Amazon Bedrock 托管模型以及 Ollama 等本地/免费模型整合在一个路由层之后。

The problem with single-lane AI coding

单通道 AI 编程存在的问题

If you use a premium coding assistant every day, you have probably seen this already. A lot of the workload is not actually premium reasoning work. For example: “open this file”, “search for auth middleware”, “summarize this module”, “show me where this class is used”, “read these test failures”. These are useful requests, but they are not the same as: “refactor this subsystem”, “design a safer auth flow”, “debug this multi-step failure”, “trace this agent loop bug”, “rewrite this implementation across five files”. Yet most tools send both classes of work through the same expensive path. That creates three problems: 如果你每天都在使用高级编程助手，你可能已经注意到了这一点。许多工作负载实际上并不需要高级推理能力。例如：“打开这个文件”、“搜索身份验证中间件”、“总结这个模块”、“显示这个类在哪里被使用”、“读取这些测试失败信息”。这些请求很有用，但它们与“重构这个子系统”、“设计更安全的身份验证流程”、“调试这个多步骤故障”、“追踪这个代理循环错误”、“跨五个文件重写此实现”完全不同。然而，大多数工具却将这两类工作都通过同一个昂贵的路径发送。这导致了三个问题：

You waste premium capacity: If a subscription-backed or premium model handles every tiny prompt, you burn good capacity on low-value tasks.
浪费高级算力：如果由订阅支持的高级模型处理每一个微小的提示词，你就会在低价值任务上消耗宝贵的算力。
You stay locked into one provider: Even if you already have access to Azure, AWS, or local models, your coding workflow is often tied to one vendor path.
被单一提供商锁定：即使你已经拥有 Azure、AWS 或本地模型的访问权限，你的编程工作流通常仍被绑定在单一供应商的路径上。
You lose resilience: If one provider is rate-limited, degraded, or just not the best fit for a task, you have no routing layer to adjust.
缺乏弹性：如果某个提供商受到速率限制、性能下降或仅仅是不适合某项任务，你没有路由层来进行调整。

The idea behind Lynkr

Lynkr 背后的理念

Lynkr sits between AI coding tools and model providers. It works as an LLM gateway, which means the coding tool talks to Lynkr, and Lynkr decides what to do next. That lets the gateway: route by complexity, compress bulky tool outputs, cache repeated requests, switch providers without changing the client workflow, use different backends for different classes of tasks. The part I am most excited about is hybrid routing across: Claude Pro/Max, Azure AI Foundry, Amazon Bedrock. Lynkr 位于 AI 编程工具和模型提供商之间。它作为一个 LLM 网关运行，这意味着编程工具与 Lynkr 对话，由 Lynkr 决定下一步的操作。这使得网关能够：根据复杂度进行路由、压缩庞大的工具输出、缓存重复请求、在不改变客户端工作流的情况下切换提供商、针对不同类别的任务使用不同的后端。我最兴奋的部分是跨 Claude Pro/Max、Azure AI Foundry 和 Amazon Bedrock 的混合路由功能。

What “extending Claude Pro/Max” means

“扩展 Claude Pro/Max”的含义

The simplest version looks like this: simple tasks → local/free model; hard coding tasks → Claude Pro/Max subscription; enterprise workloads → Azure AI Foundry; fallback or alternate routing → Amazon Bedrock. So instead of replacing Claude, Azure, or Bedrock, the gateway combines them. This is the key idea: extend your Claude Pro/Max usage instead of burning it on everything. 最简单的版本如下：简单任务 → 本地/免费模型；硬核编程任务 → Claude Pro/Max 订阅；企业级工作负载 → Azure AI Foundry；故障转移或备用路由 → Amazon Bedrock。因此，该网关不是要取代 Claude、Azure 或 Bedrock，而是将它们结合起来。这是核心理念：扩展你的 Claude Pro/Max 使用价值，而不是将其消耗在所有事情上。

Example workflow

工作流示例

Imagine a coding session that looks like this: “Read the auth middleware and summarize it.” Route to a cheap local model. “Search all routes that call this helper.” Still cheap/local. “Refactor this auth flow to support tenant isolation.” Route to Claude Pro/Max. “Generate an enterprise-safe variant for our internal stack.” Route to Azure AI Foundry. “Azure is unavailable or rate-limited.” Fallback to Bedrock. That is a much more natural way to run coding agents than pretending every prompt deserves the same model path. 想象一下这样的编程会话：“读取身份验证中间件并总结它。”路由到廉价的本地模型。“搜索所有调用此辅助函数的路由。”依然是廉价/本地模型。“重构此身份验证流程以支持租户隔离。”路由到 Claude Pro/Max。“为我们的内部技术栈生成一个企业级安全变体。”路由到 Azure AI Foundry。“Azure 不可用或受到速率限制。”回退到 Bedrock。这比假定每个提示词都值得使用相同的模型路径要自然得多。

Why Claude Pro/Max + Azure + Bedrock is interesting

为什么 Claude Pro/Max + Azure + Bedrock 的组合很有趣

This combination matters because each lane solves a different problem. Claude Pro/Max: Great for high-quality coding and reasoning tasks where you already have subscription value. Azure AI Foundry: Useful when a team wants enterprise-hosted models, internal approvals, or Azure-aligned infrastructure. Amazon Bedrock: Useful for AWS-native orgs, alternate model access, or fallback when you want another enterprise provider path. Local models: Useful for cheap, frequent, low-stakes tasks that should not consume premium capacity at all. Putting these together in one gateway gives you a better operational model than any one of them alone. 这种组合之所以重要，是因为每个通道都解决了不同的问题。Claude Pro/Max：非常适合高质量的编程和推理任务，且你已经拥有订阅价值。Azure AI Foundry：当团队需要企业托管模型、内部审批或与 Azure 对齐的基础设施时非常有用。Amazon Bedrock：适用于 AWS 原生组织、备用模型访问，或当你需要另一个企业级提供商路径作为回退时。本地模型：适用于廉价、频繁、低风险的任务，这些任务根本不应该消耗高级算力。将它们整合在一个网关中，为你提供了比单独使用其中任何一个都更好的运营模式。

Why this matters for coding agents specifically

为什么这对编程代理尤为重要

I think coding is one of the best use cases for an LLM gateway because coding workflows are: tool-heavy, repetitive, multi-step, full of structured outputs, sensitive to token waste, often spread across many turns. That means a gateway can add value in several ways: 1) Complexity-based routing, 2) Cost control, 3) Better use of subscriptions, 4) Enterprise compatibility, 5) Resilience. 我认为编程是 LLM 网关的最佳用例之一，因为编程工作流具有以下特点：工具密集、重复性高、多步骤、充满结构化输出、对 Token 浪费敏感、通常跨越多个回合。这意味着网关可以通过多种方式增加价值：1) 基于复杂度的路由，2) 成本控制，3) 更好地利用订阅，4) 企业兼容性，5) 弹性。

Where MCP and agent workflows fit in

MCP 和代理工作流的定位

Another reason this matters is MCP and agentic tooling. As coding tools become more agentic, they use more: tool schemas, file reads, command outputs, structured results, long multi-turn sessions. That creates a lot of overhead and a lot of repeated context. A gateway is the right place to optimize that. That is also why I think the future is not just better models. It is better routing, caching, tool handling, and workload separation around those models. 这之所以重要的另一个原因是 MCP 和代理工具。随着编程工具变得越来越智能化（Agentic），它们会使用更多的：工具模式、文件读取、命令输出、结构化结果、长多轮会话。这会产生大量的开销和重复的上下文。网关是优化这些问题的最佳场所。这也是为什么我认为未来不仅仅是更好的模型，而是围绕这些模型进行更好的路由、缓存、工具处理和工作负载分离。

What I wanted Lynkr to do

我希望 Lynkr 实现的目标

I did not want just another OpenAI-compatible endpoint. I wanted a gateway that could actually help with real coding economics and workflow design. For me, that means: keeping the coding tool workflow the same, preserving subscription value, combining subscription + cloud + local lanes, supporting enterprise backends, reducing waste on easy tasks. 我想要的不仅仅是另一个兼容 OpenAI 的端点。我想要一个能够真正帮助优化编程经济性和工作流设计的网关。对我来说，这意味着：保持编程工具工作流不变、保留订阅价值、结合订阅+云端+本地通道、支持企业级后端、减少简单任务上的浪费。

Who this is for

适用人群

I think this is especially useful for: Claude Code users who want more mileage from Pro/Max, teams using Azure AI Foundry for approved enterprise model access, AWS teams already standardizing on Bedrock, developers mixing local models with premium coding assistants, MCP and agent workflow builders who need an LLM gateway. 我认为这特别适用于：希望从 Pro/Max 中获得更多价值的 Claude Code 用户、使用 Azure AI Foundry 进行合规企业模型访问的团队、已经标准化使用 Bedrock 的 AWS 团队、将本地模型与高级编程助手混合使用的开发者，以及需要 LLM 网关的 MCP 和代理工作流构建者。

Final thought

结语

I do not think the next big improvement in AI coding comes only from stronger base model. 我不认为 AI 编程的下一个重大进步仅仅来自于更强大的基础模型。