two automation stacks, same app

Manychat automates WhatsApp through Meta. There is another path.

Every guide to this topic assumes one answer: sign up for a Business API provider, get your number verified, write message templates, wait for Meta to approve them, and pay per conversation to reach opted-in contacts. That is the right answer when you are broadcasting to strangers. It is the wrong answer when the WhatsApp account you want to automate is the one you already use to talk to people.

This guide is about the other path. A local agent that drives the real WhatsApp Desktop app on your Mac through operating-system accessibility. No Meta verification, no template queue, no 24-hour window, no per-message fees. Eleven tools, zero env vars, and one ~/.claude.json entry.

Which path fits my problem whatsapp-mcp source →

Matthew Diakonov, Written with AI

Published April 23, 202612 min read

4.8from traced against whatsapp-mcp-macos v1.1.0

Line-level walk through the Swift send-and-verify loop

Honest comparison table between Manychat and the local agent path

Concrete decision rules for which path to pick

Two automation stacks. One app.

Manychat speaks Meta's API. whatsapp-mcp speaks macOS.

Manychat rides the Business API: templates, windows, fees.

whatsapp-mcp rides the accessibility tree of the Desktop app.

Different senders, different reach, different compliance shape.

Pick by whether you broadcast or converse.

0:00 / 0:05

Why Manychat dominates this topic, and what it leaves out

Manychat is the best known marketing automation platform that supports WhatsApp, so when people search for ways to automate WhatsApp the answers center on its product. The flow builder, click-to-WhatsApp ads, template library, CRM sync, and Shopify integration are real, and for the job Manychat is designed to do (outbound marketing to opted-in contacts through a certified business number) it is a reasonable pick.

What every Manychat-centric article leaves out is that the product is a skin on Meta's WhatsApp Business API. It is not a different automation technology; it is a dashboard on top of the same API that Wati, Interakt, Twilio, and a hundred other Business Solution Providers wrap. You are doing Meta's dance either way: business verification in Facebook Business Manager, a display name Meta approves, message templates Meta approves per language, the 24-hour freeform window, and the opt-in requirement before any proactive message.

None of that applies to the automation stack described below. The two approaches share only the word automation and the app icon.

Side by side, by every property that matters

These are the ten dimensions I use when I help someone pick between the two paths. Not a feature checklist; a shape comparison. A yes in one column and a no in the other is usually telling you that the tools are not substitutes, they are alternatives for different problems.

Feature	Manychat (Business API)	whatsapp-mcp-macos
Who the sender is	A registered WhatsApp Business number that Meta has approved and verified.	You. Messages leave the same WhatsApp account your friends and colleagues already message.
Meta Business API	Required. Business Manager verification, display name approval, phone number registration.	Not involved at any point. The server drives the Desktop app via accessibility, not an HTTP endpoint.
Message templates	Required for any message sent outside the 24-hour window. Each template needs Meta approval per language.	No templates. Anything you can type into the compose field, the server can type.
24-hour messaging window	Enforced. Freeform messages only within 24 hours of a contact's last inbound. After that, templates only.	No window. The server is typing, as you, into conversations that are already open.
Opt-in requirement	Each recipient must opt in before the business can send proactive messages.	No opt-in layer. The server cannot reach anyone you are not already in conversation with on your own phone.
Pricing per message	Meta charges per conversation, varying by country and category (utility, marketing, service).	Zero per-message cost. No API fees. The only runtime cost is whatever the host AI already charges.
Reach	Anyone who has opted in, scales to tens of thousands if Meta raises your tier.	One Mac at a time, limited to contacts already in your WhatsApp sidebar. Not a broadcast tool.
Control surface	Flow builder in a web dashboard, with branching, tags, and routing rules.	11 tool calls a model can compose: search, open chat, read, send, verify, navigate, list, scroll, status, start, quit.
Compliance posture	Official. Meta audits templates and business profile. Suitable for regulated marketing.	Personal automation. The same act as using the app yourself. Not suitable for outbound broadcast.
Failure surface	Template rejection, rate limits per quality rating, account pause by Meta.	AX call timeout (5.0 s hard ceiling), WhatsApp window not frontmost, Accessibility not granted to the host.

Which path fits which problem

The decision is almost never about features. It is about who the sender is and who the recipient is. If the sender is a registered business and the recipients are a list of opted-in contacts, the Business API is the only legitimate path. If the sender is you and the recipients are already in your WhatsApp sidebar, the Business API is more permission and more process than the job needs.

Broadcast marketing to strangers

You want thousands of opted-in leads to receive templated promos, delivery notices, or renewal reminders. Pick Manychat. That is exactly what the Meta Business API and a flow builder are for.

Act on your own conversations

You want an AI to reply to real people who message you, summarize unread chats, draft responses you approve, or route questions to the right teammate. Pick the local agent path. The Business API cannot send as you.

Both at once is fine

Nothing stops you from running Manychat on a business number for outbound and running a local agent on your personal number for inbound. They do not conflict. They are not even aware of each other.

Cost shape is opposite

Manychat bills per conversation, predictably. A local agent bills per AI inference, unpredictably. If your volume is 20 chats per day, the local path is cheaper. If it is 20,000, Manychat is.

How the local-agent path actually flows

One host, one stdio child, one open WhatsApp window. The host forwards the model's tool calls to the child; the child walks the accessibility tree of the running app, performs the action, and reads the tree back to confirm the result.

the model's tools/call frames flow through one host and one child

Nothing in this diagram speaks HTTP to Meta. The arrows that leave the child go to the OS accessibility framework, not to a network.

anchor fact

0 tools, 0 env vars, 0s AX ceiling

The binary logs exactly this on startup: setupAndStartServer: defined 11 tools. You can grep for that line in the host's MCP log to confirm the child booted. The configuration takes zero environment variables: there is no WHATSAPP_API_KEY, no META_BUSINESS_ID, no tokens of any kind. The env block in your MCP config is literally {}.

Every tool that touches the UI lives under a hardcoded 5.0 second accessibility ceiling. At Sources/WhatsAppMCP/main.swift:120 the call AXUIElementSetMessagingTimeout(appElement, 5.0) is set on the application element before any traversal begins. There is no flag to relax it. That number, and not any network concern, is your error budget for every write.

What one send-and-verify cycle actually does

This is the part that cannot exist in a Business API flow, because there is no UI to read back. A Business API send gets you a message id and an eventual delivery webhook. A local-agent send reads the rendered message node out of the accessibility tree and compares the text to what was just pasted. Delivery is not trusted; it is observed.

Sources/WhatsAppMCP/main.swift

The string it looks for is literal: Your message, is the prefix WhatsApp Desktop puts in the AX description of every outgoing message bubble. The server strips the time suffix, lowercases both sides, and returns verified: true only when the text that came back contains the text that went in. If the app was still syncing, or the user dragged the window, or the compose field was not actually the one found, verification fails and the caller can retry.

The entire configuration is this

One block in ~/.claude.json. Stdio transport, the binary name, no args, an empty env. If you have configured any other MCP server before, this shape is identical.

~/.claude.json

What it looks like end to end

Install through npm, add the config, restart the host, ask the model to send a message. The host reports back what the verification read from the tree, which is the first thing most people discover they care about more than they expected to.

zero to verified send

Setup, in order

Five steps, none of them network-bound except the first. No verification queue, no Facebook Business Manager, no waiting on Meta approval. The slowest step is the one-time Swift build that npm runs for you.

from zero to a verified local agent

Install one npm package

npm install -g whatsapp-mcp-macos. The postinstall runs xcrun swift build -c release and produces a single binary.

Add one block to ~/.claude.json

Under mcpServers, a whatsapp entry with type stdio and command whatsapp-mcp. No env vars, no API key, no tokens.

Grant Accessibility to the host

System Settings, Privacy and Security, Accessibility. Grant the trust to Claude Code or whichever host forks the MCP child, not to the binary itself.

Restart the host and launch WhatsApp

The host forks one whatsapp-mcp child on startup. WhatsApp Desktop must be running for any tool that touches the UI to succeed.

Ask the model to send its first message

It will call whatsapp_search, then whatsapp_open_chat, then whatsapp_get_active_chat, then whatsapp_send_message. The last call returns verified: true when the send is confirmed by reading the tree back.

Properties the local path inherits for free

Every chip below is a property of the stack, not a feature someone has to maintain. They are what you get when the automation layer is the operating system's accessibility framework rather than a vendor's Business API wrapper.

11 tools0 env vars5.0 s AX timeoutnet.whatsapp.WhatsAppAXGenericElement parseCmd+V paste inputReturn key to sendpost-send verifystdio transportno outbound socketno template approvalno 24-hour window

When Manychat is still the right call

I have spent the article on the gap, not because Manychat is bad but because the gap is the part you will not find covered anywhere else. If your automation use case is any of the following, the local path is not the answer you want:

Abandoned-cart and order-confirmation messages to tens of thousands of Shopify customers.
Marketing broadcasts to opted-in subscribers of a newsletter or course, where the legal posture depends on the Business API.
Any flow where the sender has to be a verified business name in the recipient's chat header, not a personal number.
A non-technical operator who will be configuring the flows, where the value of a hosted dashboard outweighs the value of code.

In all of those the Business API (and therefore a platform like Manychat) is the correct tool, and no amount of local accessibility automation changes that. The point of this guide is that there is a second shape of automation that no one was telling you about, and it lines up with the other half of the problems people actually have.

Not sure which WhatsApp automation path your workflow wants?

30 minutes to walk through who the sender is, who the recipients are, and which path fits. Bring your use case, not your stack.

Frequently asked questions

Is Manychat WhatsApp automation the only way to automate WhatsApp?

No. Manychat, Wati, Interakt, Twilio and every other platform you will find at the top of a search for WhatsApp automation are skins on Meta's WhatsApp Business API. They exist because Meta does not let you send messages through the API without a certified platform or your own approved app. That path is correct when your goal is outbound broadcasting to opted-in customers. It is the wrong tool when your goal is to automate the account you personally use to talk to people, because the Business API cannot send as your personal number and will never be able to. The alternative class of automation is a local agent that drives the WhatsApp Desktop app on your own machine through operating-system accessibility APIs, the way assistive tech drives any other app. whatsapp-mcp-macos is an example. It binds to the bundle id net.whatsapp.WhatsApp, walks the accessibility tree of the running WhatsApp window, and types into the compose field on your behalf.

What do I lose by skipping Manychat and using a local agent instead?

Everything the Meta Business API layer provides, because you are no longer using it. You lose the ability to initiate conversations with strangers who have not messaged you first, because on your personal account you never had that ability. You lose the flow builder dashboard, campaign analytics, segment tagging, and template management, because there is no middleware doing those things. You lose horizontal scale: a local agent runs on one Mac at a time and cannot send a thousand messages per minute even if you wanted it to. You also lose the compliance story you would present to Meta, because you are not sending through Meta at all, you are acting as yourself. The upside is that every one of those losses corresponds to a category of setup, approval, or fee that no longer applies.

Does whatsapp-mcp-macos need WhatsApp Business API access?

No. The server does not speak HTTP to Meta at any point. It binds to the WhatsApp Desktop macOS Catalyst app via its bundle id, net.whatsapp.WhatsApp, and drives it through AXUIElement calls. There is no API key, no access token, and no env var to set: the server takes zero environment variables at startup. The only thing it needs is the operating system's Accessibility permission, and that permission is granted to the host process (Claude Code, Cursor, or whichever MCP host is forking it), not to the binary itself, because on macOS the TCC database attributes the privileged AX call to the responsible parent.

Is the 24-hour window a problem for a local agent?

There is no 24-hour window for a local agent because the 24-hour window is a Business API construct. Meta enforces it at the API layer: a business cannot freeform-message a user more than 24 hours after the user's last inbound without using an approved template. A local agent is not speaking to the API. It is typing into the same compose field you would type into yourself, which means the only rule that applies is the one WhatsApp applies to you, the human, when you open the app and send a message. You can reply to your mom three days later, and so can an agent acting on your account.

How does the server confirm a message actually sent?

By reading the accessibility tree back after the Return key is pressed and comparing. Inside handleSendMessage (Sources/WhatsAppMCP/main.swift, lines 887 through 958) the server pastes the message into the compose field, posts a Return CGEvent, waits 1.0 second for the UI to settle, then re-traverses the AX tree and searches for the newest AXGenericElement whose description string begins with the literal prefix Your message, . If that element exists and its embedded text contains the message that was pasted, the tool returns verified: true. If it does not, the tool returns verified: false along with whatever the last sent message was, so the caller can decide whether to retry. This is why a send returns a strict boolean rather than an optimistic success: the verification round-trip is part of the contract.

What is the 5.0 second timeout and where does it come from?

It is the AX messaging timeout for the WhatsApp Desktop application element. At main.swift line 120 the server calls AXUIElementSetMessagingTimeout(appElement, 5.0), which tells the accessibility framework that any call made through this element (a request for children, a value fetch, a click post) has at most 5.0 seconds to complete before it errors out. That ceiling applies to every tool that touches the UI: search, open, read, send, navigate. It is hardcoded. There is no config flag to lengthen it. In practice a well-behaved WhatsApp window responds to an AX traversal in a few hundred milliseconds, so the ceiling only trips when the window is minimised, offscreen, or still loading the sidebar. If you see timeouts, check the window state before you blame the server.

Can a local agent send to someone I have never messaged before?

No. The server does not have a create-new-chat tool. The whatsapp_search tool searches your existing chats and contacts, and whatsapp_open_chat clicks the Nth search result to open a chat that was already in the sidebar. If a number is not already in your WhatsApp, the agent cannot reach them. This is a property of the accessibility approach rather than a limitation of the implementation: to message a new number on WhatsApp, you start a new chat from the Desktop UI, which is a flow a future tool could implement, but today the tool surface is intentionally scoped to conversations the user is already part of. That scoping is also what keeps the automation safe for personal use.

How many tools does whatsapp-mcp-macos expose and what are they?

Eleven, defined at main.swift line 1110 where the array [statusTool, startTool, quitTool, getActiveChatTool, listChatsTool, searchTool, openChatTool, scrollSearchTool, readMessagesTool, sendMessageTool, navigateTool] is assembled and passed to the server. The server logs this on boot with the stderr line setupAndStartServer: defined 11 tools. The split is five inspection tools (status, get_active_chat, list_chats, read_messages, navigate state), four navigation tools (start, quit, search, open_chat, scroll_search), and two write tools (send_message and the implicit send inside status checks). A Manychat flow builder exposes hundreds of dashboard affordances. This is deliberately a smaller surface because an AI host does not need a dashboard, it needs primitives it can compose in a turn.

Does this replace Manychat or sit next to it?

For different jobs. If your business is growth-mode e-commerce sending order confirmations and abandoned-cart templates to thousands of opted-in customers, Manychat replaces nothing for you; keep it and do not look at the local path. If you run a small practice, a creator studio, or a founder-led sales motion and your WhatsApp is the front door for real human inbound, Manychat is the wrong shape: you do not have a Meta-approved business number and you do not want opt-in walls between you and the person who just referred a friend. For that workflow the local agent is the right tool and you will not need Manychat at all. Some teams run both, on separate numbers, for exactly that reason.

What are the real prerequisites to get started with the local path?

A Mac running macOS 13 or later, WhatsApp Desktop installed and signed in, an MCP-aware host (Claude Code, Cursor, or another client that reads ~/.claude.json style config), Node 18 or later to run the npm install which compiles Swift via the postinstall, and one minute in System Settings to grant Accessibility to the host. Total time from zero to first verified send is roughly five minutes, and the vast majority of that is the one-time Swift build. There is no account to create, no phone number to register, no template to write.