MCP server production gotchas, the local-stdio kind

Every "MCP server in production" post I read while building one was secretly HTTP advice: rate limits, vaulted secrets, retries with backoff, OpenTelemetry. Fine for a remote MCP. Useless for a local stdio MCP that actually does something to the machine. Below are nine gotchas that bit me shipping whatsapp-mcp-macos, a Swift binary that drives the native WhatsApp Catalyst app through accessibility APIs, with the line numbers from Sources/WhatsAppMCP/main.swift.

M
Matthew Diakonov
9 min read

Direct answer

For a local stdio MCP that drives a real app, "production" does not break at JSON-RPC. It breaks at the OS surface: TCC reports the host as trusted while AX calls silently return nothing, an accessibility call hangs without a timeout, a Cmd flag stays held down after a CGEvent post, the clipboard gets stomped, the cursor warps, and the send tool returns success:true for a message that never reached the wire. Everything below is one of those.

Verified against Sources/WhatsAppMCP/main.swift on 2026-05-15. 1,214 lines, 11 MCP tools, AX timeout 5.0s.

1. TCC says trusted, AX calls return nothing

The accessibility permission check on macOS is AXIsProcessTrustedWithOptions. It asks the TCC daemon whether the calling process is in the Accessibility allowlist. The daemon often answers yes when the actual permission to read another app's AX tree is broken, because TCC and the AX subsystem cache their views of the world separately.

I see this most after macOS minor updates and after the host app (Claude Code, Cursor, Fazm) is re-signed. The Accessibility pane shows a checkmark next to the host, the trust check returns true, and every downstream call to AXUIElementCopyAttributeValue returns empty arrays or kAXErrorCannotComplete. The fix is not to fight TCC. It is to never trust the trust check alone.

main.swift

Two checks, not one. The trust check is cheap and runs first. probeAccessibility actually pulls the children of the running app element with a 2-second timeout. If trust is true but the probe fails, the server emits a status that says so, out loud, with a fix:

whatsapp_status

The model on the other side now has something it can act on instead of a green light it has to second-guess.

2. AX calls hang forever without an explicit timeout

The default timeout on AXUIElementCopyAttributeValue is, in practice, "until the target process responds." If the target is frozen, paging in, mid-launch, or held by a system service, the call blocks the calling thread. Inside a synchronous MCP tool handler, that means the JSON-RPC response never goes out, and the MCP client eventually times out the whole tool call.

AXUIElementSetMessagingTimeout(elem, 5.0) fixes this. Apply it once to the application element at the start of every traversal:

main.swift

5.0 sounds long. It is the empirical floor. WhatsApp Desktop under a Spotlight reindex routinely needs 2 to 3 seconds. The 2.0-second timeout on the probe is shorter because the probe only asks for the children of one element, not a fifteen-level descent.

3. Cmd stays held down after a CGEvent post

To paste, the server posts Cmd+V as a synthetic CGEvent with CGEventFlags.maskCommand. The kernel treats those flags as sticky until something explicitly releases them. If the next key the user types arrives before any unmodified CGEvent clears the flag, the OS reads it as Cmd+<char>. Browser tabs close. Terminal opens new windows. Files get archived.

The defense is a bracketed key-up dance: before and after every modified send, post explicit key-up events for keycodes 55, 56, 58, and 59 (Cmd, Shift, Option, Control) so the kernel registers them as released.

main.swift

The 50ms sleep is not aesthetic. Without it, the rapid sequence of key-up events arrives before the prior key-down has been fully dispatched, and macOS drops some of the releases.

4. Cmd+V stomps the clipboard the user was holding

The most reliable way to type Unicode into a Catalyst text field from the outside is to write the string to NSPasteboard.general and post Cmd+V. The first version of the send tool did that, and the first user bug was a developer complaining that their carefully prepared 800-word paste buffer was gone every time Claude messaged a contact.

Back up, paste, restore, in the same function:

main.swift

Two gotchas live inside this fix. pb.string(forType: .string) only backs up the plaintext representation, so a clipboard holding an image, PDF, or RTF payload still gets replaced with text. And the 0.35-second sleep before clear-and-restore is mandatory, because WhatsApp's Cmd+V handler reads the pasteboard asynchronously and will paste empty if you clear too soon.

5. CGEvent mouse clicks warp the real cursor

A synthetic click at (1820, 540) moves the actual pointer to (1820, 540). The user's cursor jumps, their hover state changes, and if they happen to be mid-drag the drag ends in the wrong place. Save the position before the click and post a mouse-moved event back to the original location afterwards.

main.swift

The Y axis flip in saveCursorPosition is the macOS convention: NSEvent uses bottom-left origin, CGEvent uses top-left. The conversion has to happen exactly once. A missed or duplicated flip parks the restored cursor at the wrong end of the screen and looks identical to no restore at all.

6. The send tool returns success:true for messages that never sent

The naive send is: click compose field, paste text, press Return, return {success: true}. That works most of the time and fails in subtle ways the rest of the time. The compose field can lose focus to an autocorrect popover. The chat header can change between paste and Return because a new incoming message reordered the sidebar. Catalyst sometimes drops the keystroke if the window is in the middle of a layout pass.

The only honest verification is to read back the AX tree afterwards and look for the message you just sent:

main.swift

Two design choices worth flagging. First, the comparison is hasPrefix or contains, not equality, because the rendered message has been through WhatsApp's own normalization (emoji shortcodes, link previews, trailing whitespace stripped). Second, the failed path returns success:true, verified:false, not success:false. If the model sees a hard fail it will retry, and a retry is a double-send. A soft unverified status lets the agent decide for itself.

7. Lazy-loaded UI looks empty until you scroll three times

WhatsApp's search results are lazy-loaded. Traverse the AX tree right after typing a query and you might see eight results, or two, or none, depending on whether the sidebar finished rendering. The tempting fix is a longer sleep. The actual fix is to scroll the list and re-read, because lazy-load is triggered by user-shaped scroll events.

main.swift

Three is empirical. One scroll loads about five more results. Two loads ten. After three, the next scroll usually returns nothing new, and that's the signal to stop. The 300ms sleep between scrolls gives Catalyst time to finish the cell-creation pass before the next scroll fires.

8. AX descriptions are full of invisible bidirectional Unicode

Read an AXButton's description and you may get back something that looks like "Sara" on screen but is actually "‪Sara‬" in bytes. WhatsApp uses bidirectional isolates and embedding marks aggressively to keep names, phone numbers, and timestamps rendering correctly in RTL locales. Every comparison your MCP does on AX strings is wrong if you don't strip them.

main.swift

The codepoints, in case you ever need to type them into a grep: LRM (U+200E), RLM (U+200F), ZWSP (U+200B), ZWNJ (U+200C), ZWJ (U+200D), the four isolates U+2066-2069, and the embeddings U+202A-202E. Strip them once at the boundary, then every downstream string compare and regex behaves.

9. Quit politely, then force-quit before the MCP client times out

NSRunningApplication.terminate() is a polite ask. The app can refuse, prompt the user to save, or quietly ignore the signal if it's mid-modal. Meanwhile, the MCP client on the other side is waiting for one JSON line and has its own tool-call timeout, often 10 to 30 seconds. Block too long and the client decides your tool is broken even after the kill goes through.

main.swift

Five seconds of grace, then forceTerminate. The response carries a force_quit field so the model can tell the user why their unsaved draft message disappeared. A silent force-quit looks the same to the model as a clean quit, which is worse than admitting what happened.

The checklist version, for the next local stdio MCP

If you are about to ship a local MCP that drives a real app, these are the failure modes worth pre-mortem-ing.

Don't do, and do

  • Trust the trust check. AXIsProcessTrustedWithOptions(prompt: false) can return true while every subsequent AX call returns nothing. Always probe.
  • Block forever on an AX call. AXUIElementCopyAttributeValue does not time out by default, set AXUIElementSetMessagingTimeout(elem, 5.0) on the app element.
  • Leave Cmd held down. After Cmd+V the user's next key is Cmd-prefixed unless you explicitly post key-up events for keycodes 55, 56, 58, 59.
  • Stomp the clipboard. If you Cmd+V, back the old clipboard up and restore it within the same call, or you just deleted whatever the human was about to paste.
  • Warp the cursor. CGEvent mouse events move the real pointer, save NSEvent.mouseLocation before the click and restore it after.
  • Return success:true from a send tool. Walking the AX tree after Return and finding 'Your message, <text>' is the only honest verification.
  • Trust the first read of a lazy list. WhatsApp's sidebar lazy-loads, scroll three times with a sleep between each before declaring 'no results'.
  • Forget bidirectional Unicode. AX descriptions contain U+200E, U+200F, and the U+2066-2069 isolates, strip them before any prefix/contains compare.
  • Send terminate and assume the process is gone. Loop ten times at 0.5s then forceTerminate. The MCP client is waiting for one JSON line, don't make it hang.
  • Return JSON the model can act on. 'Permission not granted' is a dead end. 'Permission not granted, here is the System Settings pane to open' is a tool call.

Where this leaves the "MCP server in production" genre

The HTTP-shaped advice is fine when your MCP server is a thin remote wrapper. The moment your server forks under stdio and touches the machine, the production surface moves. You stop debugging webhook retries and start debugging permission daemons, event taps, and lazy AX trees. None of that is in the typical production checklist. All of it is in main.swift, with line numbers.

The repo is github.com/m13v/whatsapp-mcp-macos. Eleven MCP tools, one Swift file, every gotcha above wired in. If you're building a similar local MCP and want to compare notes on your own OS-binding layer, I'm happy to look at it.

Building a local stdio MCP and hitting one of these?

If you're shipping an MCP that touches the OS and any of this sounds familiar, 25 minutes is usually enough to compare your defense against mine.

FAQ

Why don't generic 'MCP server in production' guides cover any of this?

Most of them assume the MCP server is a thin HTTP wrapper around a SaaS API, so the production playbook is the SaaS playbook: secrets in a vault, retries with exponential backoff, structured logs, rate-limit handling, OpenTelemetry. None of that is wrong for a remote MCP server, but it is irrelevant to a local stdio MCP that drives a real app on the user's machine. The gotchas there live one layer down, in the OS surface the MCP touches: permission daemons, event taps, window servers, lazy-loaded UI. The WhatsApp MCP I ship is a Swift binary that gets forked by Claude Code with type stdio, and every interesting bug in its first month was at that layer, not the JSON-RPC layer.

What is the worst of the nine, in practice?

Silent permission failure. AXIsProcessTrustedWithOptions returns true, the menu bar shows the host app with a check next to it in Accessibility, and AXUIElementCopyAttributeValue still returns kAXErrorCannotComplete or just an empty result. This happens after macOS minor updates and after the host app is re-signed (which Claude Code does on every release). The user sees the MCP look healthy in whatsapp_status, then sees every other tool return mysterious empty results. The defense is the functional probe at line 576 of main.swift, which actually pulls the children of the app element with a 2 second timeout and surfaces a 'TCC stale' warning. Without it the user has no idea where to look.

Why is AXUIElementSetMessagingTimeout 5.0 seconds and not 1.0?

Because the AX tree of a Catalyst app on a busy machine genuinely takes that long sometimes. 1.0 second is fine when WhatsApp Desktop is idle and forward, but the same call against a backgrounded WhatsApp on a Mac under heavy load (Spotlight reindex, an iCloud sync storm, a Time Machine snapshot) routinely needs 2 to 3 seconds. 5.0 is the empirical floor where the timeout becomes a real error signal, not a flaky one. The probe (line 578) uses 2.0 because it is only walking one level, just children of the app element. The full traversal at line 120 is 5.0 because it walks up to fifteen levels deep.

Is the stuck-modifier flag really a thing or are you being paranoid?

It is real. CGEvent posts a key-down with flags set, and on macOS the flags are sticky from the kernel's perspective until something explicitly releases them. If the next CGEvent your code or the user's keyboard posts does not carry the same flag, the OS sees a Cmd-modified keystroke. I shipped a build once that did not release modifiers after a Cmd+A in the search field, and the user's next typed character was treated as Cmd+<char>, which scrolled their browser, opened a new tab in Terminal, and did roughly nothing the user wanted. The fix is the bracketed key-up dance at the top and bottom of sendKeyEvent (line 239 to 263), explicitly posting key-up for keycodes 55, 56, 58, and 59 with a 50ms sleep so the kernel registers them.

Why post-send verification instead of just trusting Return?

Because Return only worked if the compose textarea was focused, the message text actually reached it, no autocorrect modal stole focus mid-paste, and WhatsApp did not silently reject the input because the chat header changed under us. Each of those failure modes happened in early builds. The honest answer is to walk the AX tree after pressing Return, find AXGenericElement nodes whose description starts with 'Your message, ', strip the time suffix, and compare against the input. Even then I return success:true with verified:false rather than success:false, because the message may have been sent but rendered without an AX-readable description, and a hard fail would tell the model to retry, which double-sends.

How does this interact with the MCP protocol itself?

All of these gotchas are below the protocol. The MCP layer sees clean JSON-RPC over stdio: a CallTool request comes in, a JSON string goes out. The protocol gotchas (timeouts on the host side, listChanged notifications, schema validation of tool inputs) are real but well-covered in modelcontextprotocol.io's own docs. The OS-binding gotchas in this page only show up when your MCP server actually does something to the machine. If your MCP is a fetch wrapper, none of this applies. If it's a local stdio binary that drives a real app, all of it does.

Why force-terminate after only 5 seconds in the quit tool?

Because the MCP client (Claude Code, Cursor, whatever) is waiting for one JSON line in response to the tool call and most clients have their own tool-call timeout in the 10 to 30 second range. If whatsapp_quit hangs for 10 seconds waiting for a polite shutdown, the client times out, decides the MCP is broken, and the user gets a 'tool execution failed' error even though the kill succeeded a moment later. 5 seconds of grace then NSRunningApplication.forceTerminate is the right trade. The success field tells the model whether force-quit was necessary, so a downstream tool call can react.

What's the closest thing to all of this for non-macOS local MCP servers?

The pattern is the same wherever a local MCP drives a real binary or UI: there is always a permission layer that lies (Linux's seccomp profile, Windows' UI Automation trust), there is always a remote call that hangs without an explicit timeout, there is always a UI surface that returns stale state if you read it once instead of polling, and there is always a tool that needs post-action verification because the OS does not bubble up its errors synchronously. The accessibility-API specifics on this page are macOS, the failure shapes are universal.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.