WhatsApp booking automation, through the desktop client your customers already message
Every tutorial on automating WhatsApp bookings walks you through the same five boxes: Calendly, Zapier or Make, Meta Business Cloud API, a pre-approved template, a public webhook. That stack is good at one thing: blasting opted-in reminders to a list. It is terrible at the actual booking conversation, the part where someone asks "can I move Wednesday 3pm to Thursday?" and your reply needs to reference what they said three messages ago. Below is the desktop-client path: a local MCP server that drives the native WhatsApp Mac app through accessibility APIs, reads the thread, drafts a free-form reply, and verifies the bubble actually appeared before reporting success.
Direct answer
To automate WhatsApp bookings via the desktop client, run a local MCP server that talks to the native WhatsApp Mac app through macOS accessibility APIs. Your AI agent makes four tool calls per booking exchange: list unread chats, read the full thread, send a free-form reply, verify delivery. No Meta Business API key, no template approval, no 24-hour reply window, no per-conversation fee. The trade-off is honest: it's macOS only, single-account, and the WhatsApp app must be running.
Verified against Sources/WhatsAppMCP/main.swift on 2026-05-15. Source on GitHub: m13v/whatsapp-mcp-macos.
Pick the right path first
The desktop-client path is not always the right answer. If you are a large business sending 10,000 appointment reminders a day to an opted-in list, the Business Cloud API exists for a reason: it scales horizontally, ships from Meta's infra, and the per-conversation fee is cheap compared to the alternative. The desktop-client path wins in the messy middle: a few hundred booking exchanges a day from a single Mac, where each reply needs context.
| Feature | Business Cloud API (Calendly + Zapier + Meta) | Desktop client (this product) |
|---|---|---|
| How customer messages get in | Meta forwards them to a public HTTPS webhook you host | They land in your WhatsApp Desktop sidebar like any other chat |
| Reading the existing thread before replying | You re-fetch through the Graph API, scoped to the 24h conversation window | whatsapp_read_messages, returns the last 20 by default |
| Sending a free-form confirmation | Pre-approved template only outside the 24h window, per-conversation fee | whatsapp_send_message, plain text, no template |
| What you pay Meta | Per-conversation pricing, tier varies by category and region | Nothing, you're a normal WhatsApp user |
| Where the agent runs | Hosted somewhere with public ingress for the webhook | Locally on your Mac, beside the app |
| How you verify delivery | Status webhook callbacks (sent, delivered, read), eventually consistent | AX tree is re-read, 'Your message, <text>' bubble must appear |
The line is pretty clean. If your booking flow needs the agent to read the chat, you want the left column. If your booking flow is a one-way reminder blast, you want the right column.
The four-call booking loop
Once the MCP server is wired up, every booking exchange is the same four-step pattern. Each step is a single tool call, and each call returns structured JSON the agent can act on. The shape stays the same whether the customer wants to confirm, reschedule, cancel, or ask if you also do Saturdays.
List the unread bookings sitting in your sidebar
whatsapp_list_chats with filter "unread". The sidebar's already filtered to the people who actually need a reply.
Open one and read the full booking thread
whatsapp_search → whatsapp_open_chat → whatsapp_read_messages. The agent sees what the customer actually wrote, in their words.
Draft and send a free-form reply
whatsapp_send_message with plain text. No template, no category, no approval queue.
Verify the bubble actually appeared
Same call. The handler walks the AX tree a second time and looks for a 'Your message, <text>' bubble before returning verified: true.
What reading the thread actually returns
The interesting part is in the parser. The WhatsApp Mac app exposes each message bubble as an AXGenericElement with a single description string. Inbound bubbles start with "message, ". Outbound bubbles start with "Your message, ". That is the entire trick. The parser walks every AXGenericElement, discriminates by prefix, peels off the time suffix and the sender tag, and hands back a clean array.
The tool handler that wraps it is six lines, including the accessibility guard:
In a real booking session it looks like this. The agent searches for a name, opens the top result, and pulls the last twelve messages:
The agent now knows the customer is Priya, that she asked to move her Wednesday 3pm, that I promised to check the calendar, and that she said thanks. Compared to a Business API webhook payload (one message, a phone number, a conversation id, no past context), this is night and day for replying like a human.
Sending the confirmation, and verifying it landed
The send tool is two halves. First half: click the compose textarea, paste the message into NSPasteboard.general and post Cmd+V, then post a Return key. Standard accessibility-driven typing. The second half is what makes it useful for booking automation: it re-reads the chat and confirms the bubble appeared before returning.
The honest part: when verification fails (focus stolen, paste swallowed, autocorrect modal stole Return), the handler returns verified: false rather than a hard error. The agent can retry once and check again. Booking flows that double-send are worse than ones that occasionally pause for a retry.
What lives outside this layer
The MCP server is intentionally only the chat surface. It does not know what a booking is, what a calendar is, or what your business hours are. That is on purpose: every booking system is different, and embedding one would be the wrong abstraction.
What this product does
- Reads chats from the WhatsApp Mac sidebar
- Parses the message history per chat
- Sends free-form replies
- Verifies each send by re-reading the bubble
- Exposes all of it as MCP tools to your agent
What your agent owns
- Calendar (Cal.com, Google Calendar, Notion, your DB)
- Booking state machine (held, confirmed, no-show)
- Business hours and slot availability
- Payment links if you take deposits
- The actual reply text and tone
A typical wiring pattern: a Claude or Cursor session has both this MCP server and a calendar MCP loaded. Every few minutes a cron-like loop calls whatsapp_list_chats with filter: "unread", walks the unread chats, calls whatsapp_read_messages on each, asks the model what to do, and either replies or escalates to a human. Booking confirmations go out as free-form text that references whatever the customer actually wrote.
FAQ
What does 'desktop client' buy me that the WhatsApp Business Cloud API doesn't?
Chat context and freeform replies. The Business Cloud API is designed for outbound template blasts to opted-in lists, with a 24-hour reply window after each inbound message. Once that window closes, you can only send pre-approved templates. For booking automation that's the wrong shape, because most booking exchanges (reschedules, location confirmations, no-show follow-ups) are conversational and unpredictable. The desktop-client path treats your Mac's WhatsApp the same as any other chat: the agent reads the thread, drafts a free-form reply, sends it. No template, no window, no per-conversation fee.
Is this just clicking buttons in the WhatsApp Web window?
No, it's the native WhatsApp Catalyst app for macOS, and the agent talks to it through the macOS accessibility tree, not through a browser. Bundle id net.whatsapp.WhatsApp, AXUIElement traversal, AXMessagingTimeout 5.0s. Web automation is brittle here because WhatsApp Web's DOM uses contenteditable fields and React-controlled focus that breaks under Playwright. The accessibility-API path is the same API VoiceOver uses, so it's stable across WhatsApp releases.
What does the agent actually see when it 'reads the thread'?
It sees parsed messages. handleReadMessages walks the AX tree, finds AXGenericElement nodes whose description starts with 'message, ' (inbound) or 'Your message, ' (outbound), and returns a JSON array of { sender, text, time, isFromMe }. Default limit is 20 messages. That's enough for a booking agent to see who asked for what time, what was offered, and whether anyone confirmed. Source: main.swift parseMessages at line 488 to 527.
How does it verify a booking confirmation actually sent?
After pressing Return, the send handler does NOT just return success. It walks the AX tree a second time, finds the newest AXGenericElement whose description starts with 'Your message, ', strips the time suffix, and checks that the bubble's text contains what was sent. Only then does it return verified: true. If WhatsApp swallowed the input (focus stolen, chat header changed under us, paste failed silently), verified comes back false and the agent knows to retry. This is line 923 to 957 of main.swift.
Where does the actual booking calendar fit?
Outside this layer. The MCP server is the chat surface only. A real booking agent wires this together with a calendar tool (Cal.com, Google Calendar, an internal API) and uses the chat surface for read context and reply. The pattern is: whatsapp_read_messages to see what the customer asked → query the calendar tool → whatsapp_send_message with the answer → whatsapp_get_active_chat after a few minutes to see if they replied. The chat surface stays dumb; the agent owns the booking logic.
What happens if WhatsApp Desktop isn't running?
ensureWhatsAppRunning at line 79 of main.swift launches it, waits up to 2 seconds for the pid to appear, and returns the pid to the handler. So a cron job can kick off a 'send tomorrow's reminders' pass without you having WhatsApp open. The handler tools also activate the app before touching it (activateWhatsApp at line 89, pulls focus). If accessibility was never granted to the host process, requireAccessibility short-circuits and returns a JSON error explaining exactly which System Settings pane to open.
Does it work for groups, or just 1-to-1 bookings?
Both, with caveats. whatsapp_open_chat works on a group entry the same way it works on a contact, and read/send tools operate on whatever chat is currently focused. Useful for things like 'team daily standup' or 'family reminder' flows where the booking lives in a group thread. Caveat: in a noisy group the limit=20 default on read_messages can miss context, bump it to 50 or 100 if you need the agent to scroll further back.
Honest limitations?
macOS only, requires WhatsApp Desktop installed and accessibility granted. Not a hosted service. Not multi-tenant — one Mac, one WhatsApp account, one agent runtime. If you need a fleet sending opted-in template blasts to 50,000 strangers, this is the wrong tool and the Business Cloud API is the right one. For one solo founder or small team handling a few hundred booking-related exchanges per day, it's the simpler path.
Wiring this into your booking flow?
20 minutes, share your stack, I'll tell you whether the desktop-client path fits, where the Business API path would still beat it, and what the wiring looks like end to end.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.