guide / mechanism explainer

WhatsApp MCP server. Four ways one exists, and how to pick the right one.

Every “WhatsApp MCP” you find on GitHub does the same thing at the protocol layer: it serves JSON-RPC over stdio and advertises tools like send_message and list_chats. What actually differs is the bottom edge: how the server reaches WhatsApp itself. There are four real choices, and the trade-offs between them decide everything else.

This guide walks all four. The last section is the one feature nobody else documents: post-send delivery verification by re-walking the accessibility tree.

Install the macOS server source on GitHub →

Matthew Diakonov, Written with AI

Published May 19, 202610 min read

Direct answer, verified 2026-05-19

What is a WhatsApp MCP server, in one paragraph

It’s a server that implements the Model Context Protocol and exposes WhatsApp as a set of JSON-RPC tools an AI assistant can call: search contacts, list chats, read messages, send messages, navigate tabs. The MCP host (Claude Code, Cursor, Windsurf) forks the server as a stdio child process. The four common implementations differ in HOW they reach WhatsApp: Meta’s Business Cloud API, the reverse-engineered web-multidevice protocol (whatsmeow), browser automation against WhatsApp Web, or native macOS accessibility APIs against the desktop app.

Authoritative source for the macOS variant cited throughout: github.com/m13v/whatsapp-mcp-macos. Specific files referenced: Sources/WhatsAppMCP/main.swift and package.json.

The protocol is identical. The mechanism is everything.

MCP is a thin envelope. The host forks a child process, asks tools/list, and then sends tools/call for each invocation. That part is boring and the same across every server you have ever seen. What changes between implementations is what the server actually does when send_message arrives. The four common answers, in order from most-corporate to most-OS-native:

Feature	What it actually does at the bottom edge	The server you're looking at
Business Cloud API	Talks to graph.facebook.com on the user's behalf. The MCP holds an access token, a phone number ID, and a webhook URL. Examples: Infobip MCP, Zapier WhatsApp Business Messaging, most listings on mcpmarket.com.	Native B2B path. Requires a Meta developer account, business verification, message templates, and per-conversation pricing. The MCP itself is a thin HTTP client. You can broadcast to opted-in template lists but you cannot initiate freeform chats outside the 24-hour window.
whatsmeow / web-multidevice	Embeds a Go bridge that speaks the reverse-engineered WhatsApp Web multidevice protocol. Scans a QR inside the MCP and stores its own session. Examples: lharries/whatsapp-mcp, several Show HN entries.	No Meta account, but the MCP becomes a linked device, and Meta has historically banned accounts that misbehave on this protocol. The MCP stores all of your messages in a local SQLite DB so the model can grep them. Powerful, but the MCP IS your auth, if the bridge process dies the linked device is gone.
Headless WhatsApp Web	Drives web.whatsapp.com in a Playwright or Puppeteer Chromium. Examples: fyimail/whatsapp-mcp2 and similar entries on Awesome MCP Servers.	Same linked-device footprint as whatsmeow, plus a browser process you have to keep alive and a DOM you have to scrape. Selectors break every time WhatsApp Web ships. Cloudflare and bot-detection signals are still WhatsApp's call.
macOS accessibility (this server)	Binds to the running WhatsApp Desktop process by bundle id net.whatsapp.WhatsApp and drives it through AXUIElement, the same framework VoiceOver uses. Source: github.com/m13v/whatsapp-mcp-macos.	No Meta account, no QR pairing in the MCP, no linked-device slot consumed. The auth IS whatever WhatsApp Desktop is signed in as. macOS only. The MCP cannot run if the desktop app is not installed and signed in.

Three things decide which row is right for your project: who owns the auth (you, Meta, or the desktop app), who owns the rate-limit-and-ban risk, and which platforms you need to support. If you cannot tolerate the MCP holding credentials, the bottom row is the only one where it does not.

What a Business Cloud API server looks like on the wire

Most public “WhatsApp MCP” listings on directories like mcpmarket and Zapier are this. The MCP is a thin shell over an HTTPS POST to graph.facebook.com/v17.0/<phone_number_id>/messages. The MCP holds a long-lived Meta access token. The send is asynchronous: you get a queued message ID immediately, and the actual “delivered” signal arrives later as a webhook POST to a public HTTPS endpoint you host.

Business Cloud API: how a tool call routes

Three honest consequences. First, you need a Meta developer account and a verified WhatsApp Business Account, which takes one to five business days. Second, you can only send freeform messages inside the 24-hour customer-service window after a user messages you; outside that window, every send must be a Meta-approved template. Third, every conversation is metered by country and category. Right tool if you are running B2B marketing to opted-in lists. Wrong tool if you want Claude to nudge your friend about lunch.

What a whatsmeow / web-multidevice server looks like

The most popular GitHub result for “WhatsApp MCP” is lharries/whatsapp-mcp. It embeds a Go bridge built on the whatsmeow library, which speaks the reverse-engineered multidevice protocol the official WhatsApp Web client uses. The MCP scans a QR code at first launch and registers itself as one of your linked devices. It then keeps its own session alive, stores incoming messages in a local SQLite database, and the MCP tools mostly query that DB.

Three honest consequences. First, no Meta developer account is involved, so you get freeform reach to anyone you can already message. Second, the MCP IS your auth: if the bridge process dies permanently or you wipe its session, the linked-device slot is gone and you re-pair. Third, Meta has historically suspended accounts that hammer the web-multidevice protocol in a way that looks like a bot, so high-volume use is a real risk vector. Right tool on Linux or Windows, or when you genuinely need to grep across years of personal message history. Wrong tool if you want to avoid holding any kind of WhatsApp-side auth in the MCP process.

What a headless WhatsApp Web server looks like

A handful of community servers drive web.whatsapp.com inside a Playwright or Puppeteer Chromium. From WhatsApp’s point of view this is identical to the whatsmeow path: another linked device, another web session. From your point of view it is strictly worse, because in addition to the linked-device footprint you also have a browser process to keep alive and a DOM full of data-testid selectors that change every time WhatsApp Web ships. Most of the entries on awesome-mcp-server style directories that claim a “web-based” WhatsApp MCP are this. If a selector breaks at 11pm, your MCP is silently broken until you redeploy.

Right tool for a hack-day demo. Wrong tool for anything that has to keep working without weekly attention.

What a macOS accessibility server looks like

The bottom row of the comparison table is the one this site is about. It does not replace Meta’s API with another API. It replaces the API call with a process binding plus a UI-tree walk. The MCP looks up the running net.whatsapp.WhatsApp process by bundle id, gets its PID, calls AXUIElementCreateApplication, and walks the same accessibility tree VoiceOver does. There is no access token, no QR pair inside the MCP, no linked-device slot consumed. The auth is whatever the desktop app is signed in as.

macOS accessibility: how the same tool call routes

Worth comparing the two diagrams side by side. The Business API path has a network hop, an async webhook, and a token. The accessibility path has a pid, a tree walk, two CGEvents, and a second tree walk to verify. No network, no token, no webhook. The config env block reflects that:

~/.claude.json

That env: {} is not a placeholder. The MCP layer carries zero secrets. If you check this config into a backup you are not backing up a Meta token, because there is not one.

The detail no other WhatsApp MCP server documents

Every MCP server I have read returns success for send_message as soon as the underlying call accepts the payload. The Business API path returns the moment graph.facebook.com returns 200 with a message ID. The whatsmeow path returns the moment the web-multidevice handshake acks. Neither one reads the conversation back to confirm the message bubble actually rendered. If WhatsApp silently drops or queues your message, the model is told the send succeeded.

The macOS server does it differently. After it pastes the text and posts a Return CGEvent, it re-walks the accessibility tree of the WhatsApp window. WhatsApp Catalyst exposes every outgoing message as an AXGenericElement whose description is the literal string "Your message, <text>, 12:04 PM". The server filters for that prefix, strips the timestamp with a regex, normalizes whitespace and case, and prefix-matches the text it just sent. Only if that match holds does the tool return verified: true. Otherwise it returns verified: false with a warning containing the last sent bubble it actually found.

Sources/WhatsAppMCP/main.swift, line 920-957

Why this matters. An LLM that called whatsapp_send_message and got back verified: true can move on; an LLM that got verified: false knows it has to retry, re-verify the active chat, or tell the user. That is a real signal, not a 200 OK from a queue. Worth grepping the source of whatever WhatsApp MCP server you are considering for “verified” before you trust its success-return at face value.

The 11 tools the macOS server exposes

Defined in setupAndStartServer (main.swift line 990). Each tool below is a real Tool(name:description:inputSchema:) registration, not a marketing list:

whatsapp_status

Checks both that WhatsApp is running AND that accessibility is functionally working. Does a real probe, because AXIsProcessTrustedWithOptions can return true while AX calls silently fail on a stale TCC cache.

whatsapp_start / whatsapp_quit

Launches via /usr/bin/open -a WhatsApp.app, or terminates the running net.whatsapp.WhatsApp process. The quit handler falls back to forceTerminate() after a 5-second graceful window.

whatsapp_list_chats

Walks the sidebar collection of AX elements, parses unread counts and last-message previews. Filter parameter accepts all, unread, favorites, or groups.

whatsapp_search

Types the query into the search field and parses the AX tree into a structured list with section (chats vs contacts), contactName, preview, and time. Leaves search OPEN, returns indexed results.

whatsapp_open_chat

Clicks the Nth search result. Returns the chat name that actually opened, so the agent can verify it matched before sending.

whatsapp_scroll_search

Scrolls within the search results list. Use when the contact you want is not in the first batch.

whatsapp_read_messages

Parses messages from the currently open chat. Returns sender, text, time, and isFromMe. Cannot fetch messages from chats that are not open: the AX tree only shows what is rendered.

whatsapp_send_message

Pastes the text into the compose textarea (via clipboard, then Cmd+V), presses Return, then re-walks the AX tree to verify the bubble appeared. Returns verified: true or false.

whatsapp_get_active_chat

Returns the name, subtitle, and recent messages of whatever chat is currently focused. Used between open_chat and send_message to confirm targeting.

whatsapp_navigate

Switches tabs: chats, calls, updates, settings, archived, starred. Implemented as a Cmd+digit key combination via CGEvent.

Installing the macOS server, end to end

Two commands and a config edit. No portal, no verification email, no phone code. The postinstall step compiles the Swift binary, that is the only build step.

install on a fresh Mac

What each step actually does

Install WhatsApp Desktop and sign in

From the Mac App Store. One-time QR pair with your phone. This pairing lives inside WhatsApp Desktop's session storage, not inside the MCP. If you sign out of the desktop app, the MCP loses its target.

Install the MCP server from npm

npm install -g whatsapp-mcp-macos. The postinstall script runs xcrun swift build -c release and drops a Swift binary at .build/release/whatsapp-mcp. The npm bin wires whatsapp-mcp onto your PATH.

Add the server to your MCP host config

type: stdio, command: whatsapp-mcp, args: [], env: {}. The empty env block is not a typo. The MCP layer carries zero secrets.

Grant macOS Accessibility to the host

System Settings > Privacy & Security > Accessibility. Grant to the process that forks the MCP child. That's Claude.app if you launch from /Applications, Cursor if you launch from Cursor, Terminal if you test from the shell.

Restart the host, run /mcp

First tool call surfaces the Accessibility prompt if it is not yet granted. Then whatsapp_status will return accessibilityTrusted: true and accessibilityWorking: true if both checks pass.

How to choose between the four

A short decision rubric, in plain language, with no marketing varnish:

Pick Business Cloud API if your use case is opted-in broadcasts, verified-sender badge, or compliance with a published WhatsApp business policy. You are willing to wait for verification and pay per conversation.
Pick whatsmeow / web-multidevice if you are on Linux or Windows, want freeform personal reach, and accept that the MCP itself holds your session and is your linked device. You accept the (small but real) risk of account suspension if usage looks botty.
Pick headless WhatsApp Web only for a throwaway demo. It is the strictly-worse cousin of whatsmeow.
Pick macOS accessibility if you are already on macOS with WhatsApp Desktop installed, want freeform personal reach without burning a linked-device slot, and want the MCP process to hold no credentials at all. Accept that this is the only path that needs you to keep the desktop app running and to grant Accessibility to the host once.

Picking a WhatsApp MCP for something real?

Happy to look at the use case (B2B template flows, personal triage, scheduled sends) and tell you honestly which of the four paths fits. 20 minutes.

Frequently asked questions

What is a WhatsApp MCP server?

It is a process that implements the Model Context Protocol, exposing WhatsApp as a set of JSON-RPC tools an AI assistant (Claude, Cursor, Windsurf, any MCP-aware client) can call. Typical tools are list_chats, search_contact, read_messages, and send_message. The server is just a child process the host forks and talks to over stdio (or sometimes HTTP+SSE). What varies between implementations is HOW the server actually reaches WhatsApp: Meta's Business Cloud API, the reverse-engineered web-multidevice protocol, browser automation against WhatsApp Web, or macOS accessibility against the native desktop app.

Why are there four different kinds of WhatsApp MCP server?

WhatsApp does not ship one canonical API for an LLM to use. The Business Cloud API exists and is fully supported but requires a Meta developer account, business verification, message templates, and a public webhook, which is wrong for a personal use case. The web-multidevice protocol (whatsmeow) is reverse-engineered from the WhatsApp Web client and works personally but the MCP becomes a linked device that can be revoked. Headless WhatsApp Web has the same linked-device footprint plus a fragile DOM. macOS accessibility drives the genuine desktop app and the auth is just whoever you are signed in to. Each one trades off the same three things differently: who owns the auth, who owns the rate-limit risk, and what platforms it runs on.

Which WhatsApp MCP server should I pick?

If you are building a B2B product that needs verified-sender broadcasts to opted-in lists, use a Business Cloud API server (Infobip, Zapier, mcpmarket). If you are on Linux or Windows and want personal reach, use lharries/whatsapp-mcp (whatsmeow) and accept that the MCP IS your linked device. If you are on macOS and want personal reach without burning a linked-device slot or storing a Meta token, use whatsapp-mcp-macos. The single biggest selection factor is whether you can tolerate the MCP layer holding credentials. The accessibility path is the only one where the answer is no, because there are no credentials at the MCP layer.

What is the unique thing the macOS accessibility server does that the others don't?

Post-send delivery verification by re-traversing the accessibility tree. After it pastes text and posts a Return CGEvent, it walks the AX tree a second time, filters for elements with role AXGenericElement whose description starts with the literal string 'Your message, ', strips the time suffix with a regex, lowercases and trims, then prefix-matches the sent text. If the match holds, the tool returns verified: true. If it does not, the tool returns success: true, verified: false with a warning containing the last sent message it found. The Business API path returns 'queued' and tells you to wait for a webhook. The whatsmeow path returns its internal message ID. Neither one re-reads the conversation to confirm the bubble actually rendered. See main.swift lines 923 to 957 in github.com/m13v/whatsapp-mcp-macos.

Does the server need a Meta developer account or a WhatsApp Business account?

Only if you choose the Business Cloud API path. The whatsmeow path needs neither. The macOS accessibility path needs neither and additionally does not consume a linked-device slot, because it drives WhatsApp Desktop in-process rather than registering as a separate web client. If you check the env block of the recommended config (mcpServers.whatsapp.env) it is literally an empty object. There is nothing to rotate, revoke, or leak.

How does the MCP host actually talk to the server?

JSON-RPC over stdio. The host (Claude Code, Cursor, etc.) forks the child process named in your config and reads JSON-RPC messages from the child's stdout while writing to its stdin. There is no port, no socket, no TLS, no HTTPS. The host calls initialize, then tools/list to learn what the server exposes, then tools/call for each tool invocation. The macOS server here defines 11 tools in setupAndStartServer (main.swift line 990). If you have ever wired up another stdio MCP, the shape is identical, the difference is what each tool does internally.

What permission does the macOS accessibility server actually need?

Exactly one: macOS Accessibility, granted to the host process that forks the MCP child. If you launch Claude Code from /Applications/Claude.app, you grant Accessibility to Claude. If you launch Cursor, you grant it to Cursor. If you test from Terminal, you grant it to Terminal. The check is at main.swift line 562. There is also a functional probe at line 576, because AXIsProcessTrustedWithOptions can return true while the TCC database is stale and AX calls silently return nil. whatsapp_status reports both: accessibilityTrusted is the cached TCC answer, accessibilityWorking is whether a real AX read just succeeded.

What are the honest limits of an accessibility-based WhatsApp MCP server?

Three honest limits. First, the server can only see what is rendered: if a chat has not been opened, its messages are not in the AX tree and whatsapp_read_messages cannot fetch arbitrary history. Second, text only: the send tool pastes text into the compose textarea; it does not handle images, files, voice notes, or video, because the AX-level paste pathway does not carry attachments. Third, single window assumption: the tree parser uses x-coordinate thresholds to separate sidebar from chat panel, which depends on the standard layout. Plus the visible cursor briefly moves and the clipboard is briefly overwritten before being restored. If any of those constraints are deal breakers, the Business API or whatsmeow paths are the right choice instead.