Building an AI WhatsApp Bot for Business: Lessons from SARA

构建企业级 AI WhatsApp 机器人：来自 SARA 的经验教训

We built SARA — a WhatsApp AI assistant that handles customer inquiries, qualifies leads, schedules appointments, and routes tickets 24/7 — and deployed it as part of a larger AI operating system for SMBs. Here is what we learned. 我们构建了 SARA——一个全天候处理客户咨询、筛选潜在客户、安排预约并分发工单的 WhatsApp AI 助手，并将其作为中小企业大型 AI 操作系统的一部分进行了部署。以下是我们从中获得的经验。

Why WhatsApp, not a website chatbot

为什么选择 WhatsApp 而不是网站聊天机器人？

Most business chatbots live behind a “Chat” bubble on a website that nobody opens. WhatsApp is where your customers already are. In Italy and across Europe, WhatsApp has a 90%+ open rate for business messages. When a customer sends a message at 11pm, they expect a reply — not an autoresponder email. SARA lives on WhatsApp. She replies in seconds, speaks the customer’s language (we support Italian, English, Spanish, Portuguese), and escalates to a human when the conversation requires judgment. 大多数商业聊天机器人隐藏在网站上无人问津的“聊天”气泡后。而 WhatsApp 是客户已经身处的地方。在意大利及整个欧洲，WhatsApp 的商业消息打开率超过 90%。当客户在晚上 11 点发送消息时，他们期待的是回复，而不是自动回复邮件。SARA 驻扎在 WhatsApp 上。她能在几秒钟内回复，使用客户的语言（我们支持意大利语、英语、西班牙语、葡萄牙语），并在对话需要人工判断时转接给人工客服。

The tech stack

技术栈

We built SARA using: 我们使用以下技术构建了 SARA：

Baileys — a Node.js library for WhatsApp Web automation. No official API keys needed for prototyping; for production we migrated to Meta’s official Cloud API.
Baileys — 一个用于 WhatsApp Web 自动化的 Node.js 库。原型设计阶段无需官方 API 密钥；生产环境我们迁移到了 Meta 的官方 Cloud API。
Ollama + LLaMA 3 — local LLM inference on a Hetzner dedicated server. No per-token costs, full data privacy.
Ollama + LLaMA 3 — 在 Hetzner 专用服务器上进行本地大模型推理。无 Token 成本，完全保护数据隐私。
Fastify — our backend API layer. Extremely fast, TypeScript-friendly.
Fastify — 我们的后端 API 层。速度极快，对 TypeScript 友好。
PostgreSQL — all conversations, contacts, and CRM events are stored relationally. No document-store magic — just normalized tables.
PostgreSQL — 所有对话、联系人和 CRM 事件均以关系型方式存储。没有文档存储的“魔法”，只有规范化的数据表。
PM2 — process manager for the bot daemon. The bot is one module inside a larger system we call S.C.A.L.A. AI OS, an AI operating system that covers CRM, financial health scoring, 14 vertical solutions, and a full analytics layer.
PM2 — 机器人守护进程的进程管理器。该机器人是我们称为 S.C.A.L.A. AI OS 的大型系统中的一个模块，这是一个涵盖 CRM、财务健康评分、14 个垂直行业解决方案以及完整分析层的 AI 操作系统。

Conversation architecture

对话架构

SARA is not a keyword-matching bot. The flow looks like this: SARA 不是一个关键词匹配机器人。其流程如下：

Incoming message hits the Fastify webhook.
We retrieve the contact record from PostgreSQL (or create one).
We inject the contact’s CRM history, vertical context (e.g., real estate, wellness, legal), and current conversation thread into the LLM prompt.
LLaMA 3 generates a reply.
We enforce JSON-structured output for intents (book_appointment, escalate_to_human, request_quote, etc.).
The reply goes out via WhatsApp.
Intent actions trigger CRM updates, calendar entries, or Slack/email alerts to the human team.
传入消息触发 Fastify Webhook。
我们从 PostgreSQL 中检索联系人记录（或创建新记录）。
我们将联系人的 CRM 历史记录、垂直行业背景（如房地产、健康、法律）以及当前对话线程注入到 LLM 提示词中。
LLaMA 3 生成回复。
我们强制要求输出 JSON 结构的意图（如预约、转人工、询价等）。
回复通过 WhatsApp 发出。
意图操作触发 CRM 更新、日历条目或向人工团队发送 Slack/邮件提醒。

This means SARA “knows” the customer. If they asked about pricing two weeks ago, she remembers. If they are a paying client, she treats them differently than a cold lead. 这意味着 SARA “了解”客户。如果他们两周前询问过价格，她会记得。如果他们是付费客户，她对待他们的方式会与对待潜在冷门客户不同。

Handling hallucination in production

生产环境中的幻觉处理

LLMs hallucinate. In a customer-facing bot, this is unacceptable. Our mitigations: 大模型会产生幻觉。在面向客户的机器人中，这是不可接受的。我们的缓解措施：

RAG (Retrieval-Augmented Generation): we maintain a knowledge base of 174 documents (product sheets, FAQs, pricing, legal terms). Before every reply, we do a semantic search and inject the top-3 relevant chunks. This dramatically reduces made-up answers.
RAG（检索增强生成）： 我们维护了一个包含 174 份文档（产品说明书、常见问题解答、定价、法律条款）的知识库。在每次回复前，我们进行语义搜索并注入前 3 个相关片段。这大大减少了胡编乱造的回答。
Intent guardrails: if the structured output parser fails to extract a valid intent, we fall back to a safe “let me connect you with a human” message.
意图护栏： 如果结构化输出解析器无法提取有效意图，我们会回退到安全的“让我为您转接人工客服”消息。
Escalation threshold: if a conversation scores above a confidence threshold for sensitivity (legal, medical, financial advice), SARA escalates immediately.
转接阈值： 如果对话在敏感性（法律、医疗、财务建议）方面的置信度评分超过阈值，SARA 会立即转接人工。

Results in production

生产环境成果

After 90 days running SARA across client accounts: 在客户账户中运行 SARA 90 天后：

Average response time: 4 seconds (vs 6 hours for human email).
平均响应时间：4 秒（人工邮件为 6 小时）。
Lead qualification rate: 38% of inbound WhatsApp contacts converted to qualified pipeline within 48h.
潜在客户转化率：38% 的 WhatsApp 入站联系人在 48 小时内转化为合格的销售线索。
Human escalation rate: 12% of conversations — meaning 88% are fully resolved by the bot.
人工转接率：12% 的对话——意味着 88% 的问题由机器人完全解决。
Client NPS improved on average by 14 points.
客户净推荐值（NPS）平均提高了 14 分。

The bigger picture

更宏大的愿景

SARA alone is not the product. She is the interface to a full AI operating system. When a customer books an appointment via WhatsApp, that event flows into the CRM, triggers an invoice draft, updates the pipeline stage, and flags the account for follow-up. Everything is connected. If you are curious about how the full system works, you can explore S.C.A.L.A. AI OS at get-scala.com — there is a free starter plan that includes SARA, CRM, and one vertical module. SARA 本身并不是产品，她是整个 AI 操作系统的交互界面。当客户通过 WhatsApp 预约时，该事件会流入 CRM，触发发票草稿，更新销售阶段，并标记账户以进行后续跟进。一切都是互联的。如果您对整个系统的工作原理感兴趣，可以访问 get-scala.com 探索 S.C.A.L.A. AI OS——我们提供免费的入门计划，包含 SARA、CRM 和一个垂直行业模块。

Open questions / things we are still figuring out

待解决的问题 / 我们仍在探索的方向

Multi-agent handoff: when SARA escalates to a human, the context transfer is still clunky. We want a seamless “warm transfer” UX.
多智能体交接： 当 SARA 转接给人工时，上下文传输仍然比较生硬。我们希望实现无缝的“温和转接”用户体验。
Voice messages: WhatsApp users send a lot of voice notes. Whisper transcription is working in staging, not yet in prod.
语音消息： WhatsApp 用户发送大量语音笔记。Whisper 转录功能在测试环境运行良好，但尚未在生产环境上线。
Rate limits: Meta’s official Cloud API has message limits that bite when you send campaigns. Baileys has no limits but is technically against ToS for commercial use. The tension is real.
速率限制： Meta 的官方 Cloud API 在发送营销活动时存在消息限制。Baileys 没有限制，但从技术上讲违反了商业用途的服务条款（ToS）。这种矛盾是真实存在的。

What is your experience building WhatsApp bots in production? Happy to discuss in the comments. 您在生产环境中构建 WhatsApp 机器人的经验如何？欢迎在评论区讨论。