Compared by the agent's tool surface

WhatsApp MCP via accessibility vs the Business API: the agent's tool surface decides.

Most takes on this pair frame it as personal account versus verified business, or policy versus convenience. The frame that actually decides for an agent is different: what tools does the LLM see on each side, and which of them are read-side?

On the accessibility path the agent sees eleven tools, six of them read-side against your existing contacts and chat history. On the Business API the agent sees graph endpoints for send, receive, and template management, with no read-side into your contact directory at all. That gap is what makes the two architectures non-substitutable.

Matthew Diakonov, Written with AI

Published May 20, 20269 min read

Direct answer, verified 2026-05-20

These are not feature-for-feature competitors. They are two different things the agent is allowed to do.

WhatsApp MCP via accessibility exposes eleven tools to the LLM. Six are read-side: whatsapp_search, whatsapp_list_chats, whatsapp_read_messages, whatsapp_get_active_chat, whatsapp_scroll_search, whatsapp_navigate. They turn the WhatsApp window into a structured data source the agent can query against your existing personal account.
The WhatsApp Cloud Business API exposes outbound message sends, inbound webhook events, template management, and media endpoints. It has no contact-directory access, no chat-history endpoint, and no view of conversations on your personal number. It is the business's outbound surface to opted-in users.

Tool registry sourced from Sources/WhatsAppMCP/main.swift (lines 994-1110). API reference sourced from developers.facebook.com/docs/whatsapp/cloud-api/reference.

What the agent actually sees, side by side

The MCP host hands the LLM a tool list. Whatever is in that list is what the LLM can call. Whatever is not in the list does not exist as far as the agent is concerned. So the only comparison that matters is the two lists.

On the left, the eleven tools registered by whatsapp-mcp-macos. On the right, the operations a Cloud-API-wrapping MCP server has to choose from when it builds its own tool list.

Sources/WhatsAppMCP/main.swift (tool registry)

WhatsApp Cloud Business API surface (paraphrased from Meta docs)

Notice the shape of the asymmetry. The MCP path is a chat-shaped surface: search, open, read, scroll, navigate, all anchored on the chat. The Business API is a messaging-shaped surface: send a message to a number, receive a message from a number, manage the templates the messages must conform to. There is no chat object in the API at all.

The read-side is the load-bearing difference

Send-side capability is roughly symmetric: both paths can put a message in front of a recipient. The asymmetry is what the agent can read before deciding what to write. The Cloud Business API does not expose any of the following. The MCP accessibility path exposes all of them, against the personal account already signed in to the Mac.

Read-side operations: MCP via accessibility only

Searching your own contact list by display name (the name you saved Sarah as, not her phone number)
Listing the chats currently in your sidebar, with unread counts and last-message previews
Reading the visible history of a chat, including messages sent before you wired up any automation
Triaging by tab: unread, favorites, groups, archived, starred
Asking the agent "who haven't I replied to today?" and getting an answer from the same sidebar a human would scan
Drafting a reply that references something specific the contact said three messages ago

These are not edge cases. They are the bulk of what a useful personal-account agent does between message sends. An agent that cannot do them is not so much an agent as a templated SMS gateway.

One realistic agent prompt, two architectures

Take a prompt almost every personal-account agent gets at least once a day: remind me what Sarah and I last talked about. On the MCP path the agent has tools that map cleanly to that. On the Business API it stalls at step one.

MCP via accessibility, fulfilling 'remind me what Sarah and I last talked about'

WhatsApp Cloud Business API, same prompt

The second diagram is not a strawman. The Cloud API genuinely has no operation that maps to that prompt against your personal contacts. To fulfill the prompt under the Business API you would need to: register a business number, convince Sarah to message it first, persist every inbound into your own database from that point onward, then query that database. Three of those four steps are not engineering problems; they are social ones.

Operation by operation

Eleven rows. The first three rows are the ones that make the two paths non-substitutable. The remaining rows are where the rest of the architecture choice falls out.

Per-operation comparison

Both columns are real, in-production architectures. Picking between them is not about features; it is about whether the agent needs read access to your personal-account graph.

Feature	WhatsApp Cloud Business API	WhatsApp MCP via accessibility
Search your contacts by name	Not available. The Cloud API has no contact directory endpoint. You can only message a phone number you already know (in E.164 form), and you only have phone numbers of users who have messaged your registered business number first.	whatsapp_search("Sarah"). Returns ranked results from the WhatsApp sidebar's search index: chats section, contacts section, with names as you saved them, last-message preview, and timestamp. Backed by accessibility tree traversal of the sidebar's results list.
List your visible chats	Not available. The closest analog is querying your own webhook database of inbound messages. The Cloud API never gives you a list of who you have chats with.	whatsapp_list_chats(filter: "all" \| "unread" \| "favorites" \| "groups"). Returns chat names, last-message previews, and unread counts. Same data the WhatsApp sidebar renders to a human user.
Read history of an open chat	Not available. The Cloud API only delivers inbound webhook events from the moment your webhook is wired up. Anything before that, or anything sent from the consumer app on the same number, is invisible to the API.	whatsapp_read_messages(limit: 20). Returns sender, text, time, and isFromMe for each message currently rendered in the chat view. Scrolling back loads more (via the WhatsApp app's own lazy-load).
Send to someone you know personally	Requires a separate Business Cloud API number that is not signed in to the consumer app. The recipient sees a verified-business badge and the business display name, not your personal identity.	Pipeline: whatsapp_search + whatsapp_open_chat + whatsapp_send_message. Sends from the personal account signed in to the desktop app. The recipient sees the message exactly as they would from any other WhatsApp conversation with you.
Send to someone outside the 24-hour window	Requires a pre-approved template. Templates are categorized as Marketing / Utility / Authentication and reviewed by Meta. Marketing templates are billed per message by destination country.	Same pipeline. Free-form, no template review, no 24-hour gate. Subject to WhatsApp's consumer terms (no spammy outbound to strangers).
Confirm the message landed	Asynchronous via webhook. The send returns an HTTP 200 with a message_id in ~300ms. "sent", "delivered", "read" arrive on your webhook later, as the recipient's device acknowledges.	Synchronous, in-band. After Return, the server re-walks the accessibility tree and looks for an AXGenericElement whose description starts with "Your message, " followed by the text you sent. Either the bubble is there or it is not.
Sending identity	A registered Business Cloud API number. By policy that number cannot also be signed in to the consumer WhatsApp app.	Whatever account is signed in to the WhatsApp desktop app on that Mac. Typically your personal number, used for friends and family.
Where the agent sees the conversation	The agent reads whatever your webhook persisted into your database. Human operators read messages in a separate inbox tool (often a BSP console). Two views, kept in sync by your code.	The agent reads the same WhatsApp window a human reads. read_messages returns the rendered chat. The agent and the human share one view of the conversation.
Throughput ceiling per identity	Hundreds of requests per second per phone number at the API layer; rate tiers (1K to unlimited unique daily users) gate sustained throughput.	About 15 messages per minute on one Mac driving one signed-in account. Bounded by the wall-clock cost of AX traversal + paste + Return + verify per send.
Time to first message	Days to weeks. Business verification, phone-number registration, first template approval, opt-in collection, webhook stand-up.	Minutes. npm install -g whatsapp-mcp-macos, grant Accessibility permission to the host app, add a stdio entry to your MCP config, restart your agent.
Per-message cost (May 2026)	Per-message pricing since Nov 2025. Marketing templates range from roughly $0.025 (India) to $0.1365 (Germany) per message. Service replies inside the 24-hour window are free.	Free. Locally executed npm package, MIT licensed.

The honest case for the Business API

None of the above means the Business API is the wrong tool. It is the right tool when the shape of the work is fundamentally fan-out from a verified business to opted-in users, not personal-account triage. Concretely:

The Business API is the answer if

You are a verified business sending opted-in transactional messages (OTPs, order confirmations, shipping updates) at fan-out scale
Your sending identity is organizational and needs to outlive any single person on the team
You need multi-tenant SaaS shape: per-tenant tokens, audit logs, multi-number under one Business Account
You need to operate from servers, not from a Mac somebody actually uses
Your compliance officer needs to see opt-in records, template review history, and message-level audit trail

If any one of those is true, the read-side asymmetry above does not matter to you. You are not trying to give an agent a view of your personal conversation graph; you are trying to put structured outbound in front of users who opted in. That is exactly what the Cloud API was built for, and accessibility-driven automation has no business in that shape of work.

The honest case for the MCP accessibility path

The MCP path wins exactly when the work is the inverse of the above: one human (or one agent on behalf of that human) doing personal-account-shaped things at sub-15-per-minute cadence. Solo founders routing inbound from customers, partners, and friends through one number. AI agents triaging the inbox overnight and surfacing the three threads that need a reply by morning. Long-tail one-to-one coordination that should not flow through a verified business identity.

For that shape of work, the read-side tools are the entire point. The Business API does not offer them at all, and no amount of clever webhook engineering reconstructs them faithfully from inbound-only events.

Hybrid is fine and common

Nothing forces a single architecture per product. A common shape is two MCP entries in one agent config: a stdio entry pointing at the local accessibility-driven server for personal-account work, and an HTTP entry wrapping the Cloud API for verified-business outbound. The two paths target distinct sending identities (the business number cannot also be the consumer-app number at the same time), so they do not step on each other. The agent picks based on which account should send.

See the whatsapp-mcp-macos README for the stdio entry. See Meta's Cloud API getting-started guide for the verified-business onboarding side.

Not sure which side of the read/write line your agent needs?

If your use case sits between personal-account triage and verified-business fan-out, 30 minutes is usually enough to map it to the right path. Bring the prompt the agent will see.

Frequently asked questions

Why phrase the comparison as "accessibility vs Business API" rather than "desktop app vs Cloud API"?

Because accessibility is the load-bearing part. The reason WhatsApp MCP can expose six read-side tools is the macOS Accessibility framework: it lets the server walk the WhatsApp Catalyst app's AX tree and read the sidebar, chat list, and rendered messages as structured data. Without accessibility you would be stuck with screenshots and OCR. The Business API has no equivalent surface; its API contract is send / receive / manage templates, full stop. So "desktop vs cloud" hides the actual mechanism, and the mechanism is what determines what an LLM can do.

Can the Business API at least list contacts I know about through Meta's other surfaces?

No. There is no contacts directory endpoint on the WhatsApp Cloud API. Your integration only learns about a phone number when that number sends a message to your registered business number (the inbound arrives on your webhook with a contacts[] array carrying name and wa_id). You cannot enumerate users, search by name, or fetch history. This is by design: the Business Cloud API is the business's outbound + inbound surface, not a CRM. If you want a contact directory you assemble one yourself from inbound events.

Is the read-side actually useful for an agent, or is it nice-to-have?

It is the part that makes agents on personal accounts actually work. A useful WhatsApp agent rarely starts from a fully specified "send X to Y". It starts from "remind me what we talked about", "who is waiting on a reply", "summarize the unread", "find the thread where we agreed on Tuesday". Each of those is a read-side operation. An agent on the Business API can only do those if you have already taken on the engineering work of mirroring every inbound into your own database, and even then it has no view of messages from before you set that up.

Does accessibility-driven automation work on the official Business app ("WhatsApp Business" desktop), or only consumer WhatsApp?

The current MCP server targets the consumer WhatsApp Catalyst app on macOS. That app's accessibility tree is what the AX traversal code expects. WhatsApp Business is a different binary with a different UI shape. In principle the same accessibility mechanism applies, in practice the selectors and the verification-prefix string ("Your message, " on consumer) would need to be relearned. If you are running a small business off a single signed-in Business app account on a Mac, the architecture port is small but not zero. Today the supported path is consumer.

Is there a hybrid where the agent uses both, picking per message?

Yes, and it is a reasonable shape for products that have both a personal-coordination side and a transactional side. The Cloud API entry handles opted-in outbound from the business number ("your package shipped"). The local MCP entry handles personal-account inbound triage and one-to-one reply drafting from your own number. The two paths target distinct sending identities (a Business Cloud API number cannot also be the consumer-app number at the same time), so they do not step on each other. Agent config carries one stdio entry plus one HTTP entry, and the agent picks based on which account should be the sender.

What happens if WhatsApp redesigns the sidebar and the accessibility tree shifts?

The selectors in the server reference role names (AXTextArea, AXGenericElement) and a small set of verification prefixes (notably "Your message, " on outgoing bubbles). A redesign that keeps the same Catalyst roles keeps working unchanged. A redesign that swaps roles or removes the outgoing-bubble prefix would need updates to the traversal code. Catalyst has historically been stable on these specific roles. Compare to the Business API surface, which is contractually stable: Meta versions it (v21.0 today) and breaking changes are documented and dated. Different stability models, both real.

How much code is actually involved in the read-side?

The interesting part is one repeating shape: locate a window-scoped role in the AX tree, walk children depth-first, filter to elements with the description prefixes WhatsApp uses to mark sidebar rows, search results, message bubbles, and active-chat headings. That logic, plus the per-tool argument plumbing, fits inside Sources/WhatsAppMCP/main.swift. Total file is about 1,200 lines, of which the actual AX traversal is a few hundred. There is no language model in the loop on the server side; the LLM lives in your MCP host, the server is plumbing.

Is this against WhatsApp's consumer terms of service?

The accessibility framework is a sanctioned surface (VoiceOver uses the same APIs). The official WhatsApp desktop app is the sanctioned client. What gets accounts banned is the kind of activity, not the mechanism. A personal account using accessibility-driven tools to send the messages it would normally send, to people it would normally send them to, is the same as a human using the same app. The same account using any automation, including accessibility-driven automation, to mass-message strangers or behave like a verified business at scale is exactly what consumer-terms enforcement is for, and exactly what the Business API exists to make legitimate. Read the consumer terms at https://www.whatsapp.com/legal/terms-of-service before pushing volume.

Why not just scrape WhatsApp Web with Playwright and skip macOS entirely?

Two reasons. First, whatsapp-web.com is heavily fingerprinted by Meta because the multi-device protocol and WhatsApp Web traffic are where most automation has historically lived; accounts running headless-browser automation against whatsapp-web.com do get logged out and sometimes banned. Second, contenteditable inputs and focus management in headless browsers are notoriously unreliable for the WhatsApp compose box; you get sent messages that have the wrong characters, or no message at all, on a meaningful fraction of attempts. The accessibility path drives the same chat your hands would, on the same client your account already signs in to, and the traffic on the wire is the official client's normal traffic.