WhatsApp booking automation, through the desktop client your customers already message

Every tutorial on automating WhatsApp bookings walks you through the same five boxes: Calendly, Zapier or Make, Meta Business Cloud API, a pre-approved template, a public webhook. That stack is good at one thing: blasting opted-in reminders to a list. It is terrible at the actual booking conversation, the part where someone asks "can I move Wednesday 3pm to Thursday?" and your reply needs to reference what they said three messages ago. Below is the desktop-client path: a local MCP server that drives the native WhatsApp Mac app through accessibility APIs, reads the thread, drafts a free-form reply, and verifies the bubble actually appeared before reporting success.

Matthew Diakonov, Written with AI

Published May 15, 20267 min read

Direct answer

To automate WhatsApp bookings via the desktop client, run a local MCP server that talks to the native WhatsApp Mac app through macOS accessibility APIs. Your AI agent makes four tool calls per booking exchange: list unread chats, read the full thread, send a free-form reply, verify delivery. No Meta Business API key, no template approval, no 24-hour reply window, no per-conversation fee. The trade-off is honest: it's macOS only, single-account, and the WhatsApp app must be running.

Verified against Sources/WhatsAppMCP/main.swift on 2026-05-15. Source on GitHub: m13v/whatsapp-mcp-macos.

Pick the right path first

The desktop-client path is not always the right answer. If you are a large business sending 10,000 appointment reminders a day to an opted-in list, the Business Cloud API exists for a reason: it scales horizontally, ships from Meta's infra, and the per-conversation fee is cheap compared to the alternative. The desktop-client path wins in the messy middle: a few hundred booking exchanges a day from a single Mac, where each reply needs context.

Feature	Business Cloud API (Calendly + Zapier + Meta)	Desktop client (this product)
How customer messages get in	Meta forwards them to a public HTTPS webhook you host	They land in your WhatsApp Desktop sidebar like any other chat
Reading the existing thread before replying	You re-fetch through the Graph API, scoped to the 24h conversation window	whatsapp_read_messages, returns the last 20 by default
Sending a free-form confirmation	Pre-approved template only outside the 24h window, per-conversation fee	whatsapp_send_message, plain text, no template
What you pay Meta	Per-conversation pricing, tier varies by category and region	Nothing, you're a normal WhatsApp user
Where the agent runs	Hosted somewhere with public ingress for the webhook	Locally on your Mac, beside the app
How you verify delivery	Status webhook callbacks (sent, delivered, read), eventually consistent	AX tree is re-read, 'Your message, <text>' bubble must appear

The line is pretty clean. If your booking flow needs the agent to read the chat, you want the left column. If your booking flow is a one-way reminder blast, you want the right column.

The four-call booking loop

Once the MCP server is wired up, every booking exchange is the same four-step pattern. Each step is a single tool call, and each call returns structured JSON the agent can act on. The shape stays the same whether the customer wants to confirm, reschedule, cancel, or ask if you also do Saturdays.

List the unread bookings sitting in your sidebar

whatsapp_list_chats with filter "unread". The sidebar's already filtered to the people who actually need a reply.

The product reads the WhatsApp Mac sidebar through accessibility, walks AXButton elements, and returns chats with unread counts. No fetch loop, no webhook subscription, no Graph API token. If it's in your sidebar, it's in the JSON the agent gets back.

Open one and read the full booking thread

whatsapp_search → whatsapp_open_chat → whatsapp_read_messages. The agent sees what the customer actually wrote, in their words.

This is the part every Business-API booking pipeline gets wrong. Meta gives you a webhook payload with one inbound message and a conversation ID. The agent has no idea what the customer said three messages ago. The desktop-client path returns the parsed thread — sender, text, time, who-sent-which — so the agent can confirm a reschedule by referencing the original time the customer asked about.

Draft and send a free-form reply

whatsapp_send_message with plain text. No template, no category, no approval queue.

Because the agent is sending through the desktop client as you, the message can quote the customer's words back to them, reference internal calendar slots, include emoji, links, anything you would type by hand. The flow is paste-into-compose-then-Return, not a POST to graph.facebook.com.

Verify the bubble actually appeared

Same call. The handler walks the AX tree a second time and looks for a 'Your message, <text>' bubble before returning verified: true.

If WhatsApp silently rejected the input (focus stolen by an autocorrect modal, the chat header changed, accessibility went stale), verified comes back false and the agent knows to retry instead of marking the booking confirmed. It's the equivalent of Meta's delivered status callback, except synchronous and inside the same tool call.

What reading the thread actually returns

The interesting part is in the parser. The WhatsApp Mac app exposes each message bubble as an AXGenericElement with a single description string. Inbound bubbles start with "message, ". Outbound bubbles start with "Your message, ". That is the entire trick. The parser walks every AXGenericElement, discriminates by prefix, peels off the time suffix and the sender tag, and hands back a clean array.

main.swift

The tool handler that wraps it is six lines, including the accessibility guard:

main.swift

In a real booking session it looks like this. The agent searches for a name, opens the top result, and pulls the last twelve messages:

agent session

The agent now knows the customer is Priya, that she asked to move her Wednesday 3pm, that I promised to check the calendar, and that she said thanks. Compared to a Business API webhook payload (one message, a phone number, a conversation id, no past context), this is night and day for replying like a human.

Sending the confirmation, and verifying it landed

The send tool is two halves. First half: click the compose textarea, paste the message into NSPasteboard.general and post Cmd+V, then post a Return key. Standard accessibility-driven typing. The second half is what makes it useful for booking automation: it re-reads the chat and confirms the bubble appeared before returning.

main.swift

The honest part: when verification fails (focus stolen, paste swallowed, autocorrect modal stole Return), the handler returns verified: false rather than a hard error. The agent can retry once and check again. Booking flows that double-send are worse than ones that occasionally pause for a retry.

agent session, continued

What lives outside this layer

The MCP server is intentionally only the chat surface. It does not know what a booking is, what a calendar is, or what your business hours are. That is on purpose: every booking system is different, and embedding one would be the wrong abstraction.

What this product does

Reads chats from the WhatsApp Mac sidebar
Parses the message history per chat
Sends free-form replies
Verifies each send by re-reading the bubble
Exposes all of it as MCP tools to your agent

What your agent owns

Calendar (Cal.com, Google Calendar, Notion, your DB)
Booking state machine (held, confirmed, no-show)
Business hours and slot availability
Payment links if you take deposits
The actual reply text and tone

A typical wiring pattern: a Claude or Cursor session has both this MCP server and a calendar MCP loaded. Every few minutes a cron-like loop calls whatsapp_list_chats with filter: "unread", walks the unread chats, calls whatsapp_read_messages on each, asks the model what to do, and either replies or escalates to a human. Booking confirmations go out as free-form text that references whatever the customer actually wrote.

FAQ

What does 'desktop client' buy me that the WhatsApp Business Cloud API doesn't?

Chat context and freeform replies. The Business Cloud API is designed for outbound template blasts to opted-in lists, with a 24-hour reply window after each inbound message. Once that window closes, you can only send pre-approved templates. For booking automation that's the wrong shape, because most booking exchanges (reschedules, location confirmations, no-show follow-ups) are conversational and unpredictable. The desktop-client path treats your Mac's WhatsApp the same as any other chat: the agent reads the thread, drafts a free-form reply, sends it. No template, no window, no per-conversation fee.

Is this just clicking buttons in the WhatsApp Web window?

No, it's the native WhatsApp Catalyst app for macOS, and the agent talks to it through the macOS accessibility tree, not through a browser. Bundle id net.whatsapp.WhatsApp, AXUIElement traversal, AXMessagingTimeout 5.0s. Web automation is brittle here because WhatsApp Web's DOM uses contenteditable fields and React-controlled focus that breaks under Playwright. The accessibility-API path is the same API VoiceOver uses, so it's stable across WhatsApp releases.

What does the agent actually see when it 'reads the thread'?

It sees parsed messages. handleReadMessages walks the AX tree, finds AXGenericElement nodes whose description starts with 'message, ' (inbound) or 'Your message, ' (outbound), and returns a JSON array of { sender, text, time, isFromMe }. Default limit is 20 messages. That's enough for a booking agent to see who asked for what time, what was offered, and whether anyone confirmed. Source: main.swift parseMessages at line 488 to 527.

How does it verify a booking confirmation actually sent?

After pressing Return, the send handler does NOT just return success. It walks the AX tree a second time, finds the newest AXGenericElement whose description starts with 'Your message, ', strips the time suffix, and checks that the bubble's text contains what was sent. Only then does it return verified: true. If WhatsApp swallowed the input (focus stolen, chat header changed under us, paste failed silently), verified comes back false and the agent knows to retry. This is line 923 to 957 of main.swift.

Where does the actual booking calendar fit?

Outside this layer. The MCP server is the chat surface only. A real booking agent wires this together with a calendar tool (Cal.com, Google Calendar, an internal API) and uses the chat surface for read context and reply. The pattern is: whatsapp_read_messages to see what the customer asked → query the calendar tool → whatsapp_send_message with the answer → whatsapp_get_active_chat after a few minutes to see if they replied. The chat surface stays dumb; the agent owns the booking logic.

What happens if WhatsApp Desktop isn't running?

ensureWhatsAppRunning at line 79 of main.swift launches it, waits up to 2 seconds for the pid to appear, and returns the pid to the handler. So a cron job can kick off a 'send tomorrow's reminders' pass without you having WhatsApp open. The handler tools also activate the app before touching it (activateWhatsApp at line 89, pulls focus). If accessibility was never granted to the host process, requireAccessibility short-circuits and returns a JSON error explaining exactly which System Settings pane to open.

Does it work for groups, or just 1-to-1 bookings?

Both, with caveats. whatsapp_open_chat works on a group entry the same way it works on a contact, and read/send tools operate on whatever chat is currently focused. Useful for things like 'team daily standup' or 'family reminder' flows where the booking lives in a group thread. Caveat: in a noisy group the limit=20 default on read_messages can miss context, bump it to 50 or 100 if you need the agent to scroll further back.

Honest limitations?

macOS only, requires WhatsApp Desktop installed and accessibility granted. Not a hosted service. Not multi-tenant — one Mac, one WhatsApp account, one agent runtime. If you need a fleet sending opted-in template blasts to 50,000 strangers, this is the wrong tool and the Business Cloud API is the right one. For one solo founder or small team handling a few hundred booking-related exchanges per day, it's the simpler path.

Wiring this into your booking flow?

20 minutes, share your stack, I'll tell you whether the desktop-client path fits, where the Business API path would still beat it, and what the wiring looks like end to end.

Pick the right path first

The four-call booking loop

List the unread bookings sitting in your sidebar

Open one and read the full booking thread

Draft and send a free-form reply

Verify the bubble actually appeared

What reading the thread actually returns

Sending the confirmation, and verifying it landed

What lives outside this layer

FAQ

Wiring this into your booking flow?

Comments (••)

Comments ()