Building an AI WhatsApp Bot for Business: Lessons from SARA
Building an AI WhatsApp Bot for Business: Lessons from SARA
构建企业级 AI WhatsApp 机器人:来自 SARA 的经验教训
We built SARA — a WhatsApp AI assistant that handles customer inquiries, qualifies leads, schedules appointments, and routes tickets 24/7 — and deployed it as part of a larger AI operating system for SMBs. Here is what we learned. 我们构建了 SARA——一个全天候处理客户咨询、筛选潜在客户、安排预约并分发工单的 WhatsApp AI 助手,并将其作为中小企业大型 AI 操作系统的一部分进行了部署。以下是我们从中获得的经验。
Why WhatsApp, not a website chatbot
为什么选择 WhatsApp 而不是网站聊天机器人?
Most business chatbots live behind a “Chat” bubble on a website that nobody opens. WhatsApp is where your customers already are. In Italy and across Europe, WhatsApp has a 90%+ open rate for business messages. When a customer sends a message at 11pm, they expect a reply — not an autoresponder email. SARA lives on WhatsApp. She replies in seconds, speaks the customer’s language (we support Italian, English, Spanish, Portuguese), and escalates to a human when the conversation requires judgment. 大多数商业聊天机器人隐藏在网站上无人问津的“聊天”气泡后。而 WhatsApp 是客户已经身处的地方。在意大利及整个欧洲,WhatsApp 的商业消息打开率超过 90%。当客户在晚上 11 点发送消息时,他们期待的是回复,而不是自动回复邮件。SARA 驻扎在 WhatsApp 上。她能在几秒钟内回复,使用客户的语言(我们支持意大利语、英语、西班牙语、葡萄牙语),并在对话需要人工判断时转接给人工客服。
The tech stack
技术栈
We built SARA using: 我们使用以下技术构建了 SARA:
- Baileys — a Node.js library for WhatsApp Web automation. No official API keys needed for prototyping; for production we migrated to Meta’s official Cloud API.
- Baileys — 一个用于 WhatsApp Web 自动化的 Node.js 库。原型设计阶段无需官方 API 密钥;生产环境我们迁移到了 Meta 的官方 Cloud API。
- Ollama + LLaMA 3 — local LLM inference on a Hetzner dedicated server. No per-token costs, full data privacy.
- Ollama + LLaMA 3 — 在 Hetzner 专用服务器上进行本地大模型推理。无 Token 成本,完全保护数据隐私。
- Fastify — our backend API layer. Extremely fast, TypeScript-friendly.
- Fastify — 我们的后端 API 层。速度极快,对 TypeScript 友好。
- PostgreSQL — all conversations, contacts, and CRM events are stored relationally. No document-store magic — just normalized tables.
- PostgreSQL — 所有对话、联系人和 CRM 事件均以关系型方式存储。没有文档存储的“魔法”,只有规范化的数据表。
- PM2 — process manager for the bot daemon. The bot is one module inside a larger system we call S.C.A.L.A. AI OS, an AI operating system that covers CRM, financial health scoring, 14 vertical solutions, and a full analytics layer.
- PM2 — 机器人守护进程的进程管理器。该机器人是我们称为 S.C.A.L.A. AI OS 的大型系统中的一个模块,这是一个涵盖 CRM、财务健康评分、14 个垂直行业解决方案以及完整分析层的 AI 操作系统。
Conversation architecture
对话架构
SARA is not a keyword-matching bot. The flow looks like this: SARA 不是一个关键词匹配机器人。其流程如下:
-
Incoming message hits the Fastify webhook.
-
We retrieve the contact record from PostgreSQL (or create one).
-
We inject the contact’s CRM history, vertical context (e.g., real estate, wellness, legal), and current conversation thread into the LLM prompt.
-
LLaMA 3 generates a reply.
-
We enforce JSON-structured output for intents (book_appointment, escalate_to_human, request_quote, etc.).
-
The reply goes out via WhatsApp.
-
Intent actions trigger CRM updates, calendar entries, or Slack/email alerts to the human team.
-
传入消息触发 Fastify Webhook。
-
我们从 PostgreSQL 中检索联系人记录(或创建新记录)。
-
我们将联系人的 CRM 历史记录、垂直行业背景(如房地产、健康、法律)以及当前对话线程注入到 LLM 提示词中。
-
LLaMA 3 生成回复。
-
我们强制要求输出 JSON 结构的意图(如预约、转人工、询价等)。
-
回复通过 WhatsApp 发出。
-
意图操作触发 CRM 更新、日历条目或向人工团队发送 Slack/邮件提醒。
This means SARA “knows” the customer. If they asked about pricing two weeks ago, she remembers. If they are a paying client, she treats them differently than a cold lead. 这意味着 SARA “了解”客户。如果他们两周前询问过价格,她会记得。如果他们是付费客户,她对待他们的方式会与对待潜在冷门客户不同。
Handling hallucination in production
生产环境中的幻觉处理
LLMs hallucinate. In a customer-facing bot, this is unacceptable. Our mitigations: 大模型会产生幻觉。在面向客户的机器人中,这是不可接受的。我们的缓解措施:
- RAG (Retrieval-Augmented Generation): we maintain a knowledge base of 174 documents (product sheets, FAQs, pricing, legal terms). Before every reply, we do a semantic search and inject the top-3 relevant chunks. This dramatically reduces made-up answers.
- RAG(检索增强生成): 我们维护了一个包含 174 份文档(产品说明书、常见问题解答、定价、法律条款)的知识库。在每次回复前,我们进行语义搜索并注入前 3 个相关片段。这大大减少了胡编乱造的回答。
- Intent guardrails: if the structured output parser fails to extract a valid intent, we fall back to a safe “let me connect you with a human” message.
- 意图护栏: 如果结构化输出解析器无法提取有效意图,我们会回退到安全的“让我为您转接人工客服”消息。
- Escalation threshold: if a conversation scores above a confidence threshold for sensitivity (legal, medical, financial advice), SARA escalates immediately.
- 转接阈值: 如果对话在敏感性(法律、医疗、财务建议)方面的置信度评分超过阈值,SARA 会立即转接人工。
Results in production
生产环境成果
After 90 days running SARA across client accounts: 在客户账户中运行 SARA 90 天后:
- Average response time: 4 seconds (vs 6 hours for human email).
- 平均响应时间:4 秒(人工邮件为 6 小时)。
- Lead qualification rate: 38% of inbound WhatsApp contacts converted to qualified pipeline within 48h.
- 潜在客户转化率:38% 的 WhatsApp 入站联系人在 48 小时内转化为合格的销售线索。
- Human escalation rate: 12% of conversations — meaning 88% are fully resolved by the bot.
- 人工转接率:12% 的对话——意味着 88% 的问题由机器人完全解决。
- Client NPS improved on average by 14 points.
- 客户净推荐值(NPS)平均提高了 14 分。
The bigger picture
更宏大的愿景
SARA alone is not the product. She is the interface to a full AI operating system. When a customer books an appointment via WhatsApp, that event flows into the CRM, triggers an invoice draft, updates the pipeline stage, and flags the account for follow-up. Everything is connected. If you are curious about how the full system works, you can explore S.C.A.L.A. AI OS at get-scala.com — there is a free starter plan that includes SARA, CRM, and one vertical module. SARA 本身并不是产品,她是整个 AI 操作系统的交互界面。当客户通过 WhatsApp 预约时,该事件会流入 CRM,触发发票草稿,更新销售阶段,并标记账户以进行后续跟进。一切都是互联的。如果您对整个系统的工作原理感兴趣,可以访问 get-scala.com 探索 S.C.A.L.A. AI OS——我们提供免费的入门计划,包含 SARA、CRM 和一个垂直行业模块。
Open questions / things we are still figuring out
待解决的问题 / 我们仍在探索的方向
- Multi-agent handoff: when SARA escalates to a human, the context transfer is still clunky. We want a seamless “warm transfer” UX.
- 多智能体交接: 当 SARA 转接给人工时,上下文传输仍然比较生硬。我们希望实现无缝的“温和转接”用户体验。
- Voice messages: WhatsApp users send a lot of voice notes. Whisper transcription is working in staging, not yet in prod.
- 语音消息: WhatsApp 用户发送大量语音笔记。Whisper 转录功能在测试环境运行良好,但尚未在生产环境上线。
- Rate limits: Meta’s official Cloud API has message limits that bite when you send campaigns. Baileys has no limits but is technically against ToS for commercial use. The tension is real.
- 速率限制: Meta 的官方 Cloud API 在发送营销活动时存在消息限制。Baileys 没有限制,但从技术上讲违反了商业用途的服务条款(ToS)。这种矛盾是真实存在的。
What is your experience building WhatsApp bots in production? Happy to discuss in the comments. 您在生产环境中构建 WhatsApp 机器人的经验如何?欢迎在评论区讨论。