Why AI Agents Fail at Scheduling (And How to Fix It)
AI agents are getting better at everything — except scheduling.
The temporal reasoning gap
The OOLONG benchmark shows that even frontier models score below 50% on temporal reasoning tasks. Earlier research from the ICLR 2025 “Test of Time” paper found models scoring as low as 29% on scheduling and 13% on duration calculations.
This isn’t a prompting problem. It’s a computation problem. You can’t prompt-engineer your way to correct RRULE expansion or timezone-aware duration calculation. These require deterministic algorithms, not statistical prediction.
Why most calendar MCP servers make it worse
Most calendar MCP servers are thin CRUD wrappers: they list events, create events, and delete events. They delegate all temporal reasoning to the LLM — the exact component that’s bad at it.
Consider what happens when an agent tries to schedule a 30-minute meeting “next Tuesday afternoon”:
-
Resolve “next Tuesday” — The agent must determine which Tuesday relative to today. If today is Friday March 13, “next Tuesday” is March 17. An LLM might guess March 10 (last Tuesday) or March 24 (the Tuesday after next).
resolve_datetimecomputes it:{ "expression": "next Tuesday at 2pm" } → { "resolved_local": "2026-03-17T14:00:00-04:00", "interpretation": "Tuesday, March 17, 2026 at 2:00 PM" } -
Determine “afternoon” in the user’s timezone — “2pm” needs to be anchored to a timezone. Without
get_temporal_context, the agent might assume UTC or default to whatever timezone appeared most in its training data. With it, the agent knows it’s inAmerica/New_Yorkwith DST active. -
Check for recurring event conflicts — A weekly standup at 2pm might not appear in a simple event listing if the calendar provider returns only the RRULE, not expanded instances.
expand_rruledeterministically expands the recurrence to check for overlap. -
Account for DST transitions — If the date crosses a DST boundary, the UTC offset changes. A meeting that was at
14:00:00-05:00becomes14:00:00-04:00after spring-forward. The wall-clock time stays the same, but the UTC representation is different. LLMs frequently get this wrong, using the pre-DST offset for post-DST dates. -
Book without double-booking — Two agents checking the same slot at the same time will both see it as “free.” Without locking, both can book it.
book_slotuses Two-Phase Commit to prevent this.
Steps 1, 2, and 4 require deterministic temporal computation. Step 3 requires RRULE expansion. Step 5 requires locking. None of these should be left to an LLM.
The deterministic approach
Temporal Cortex moves temporal reasoning out of the LLM and into deterministic tools:
resolve_datetimeconverts natural language to RFC 3339 using a rule-based expression parser — supporting 60+ patterns from “tomorrow morning” to “third Friday of next month”expand_rruledeterministically expands recurring event rules using the Truth Engine (9,000+ property-based tests). See how it compares to LLM predictions on 5 real-world RRULEs.find_free_slotscomputes actual availability by merging events across Google Calendar, Outlook, and iCloud into a single free/busy viewbook_slotuses Two-Phase Commit to prevent race conditions between concurrent agents
The LLM’s role is reduced to intent extraction: “The user wants a meeting next Tuesday afternoon.” Everything else is computed.
And for people who don’t have AI agents yet, the compose_proposal tool generates shareable booking links — so an agent-equipped user can schedule with anyone, not just other agent users. This backward-compatible path is part of treating scheduling as infrastructure, not just an AI feature.
Getting started
Install Temporal Cortex and give your AI agent deterministic scheduling:
npx @temporal-cortex/cortex-mcp
The 18 tools across 5 layers handle temporal context, calendar operations, availability computation, safe booking, and open scheduling. Your agent gets it right the first time. Follow the step-by-step tutorial to go from install to a booked meeting in under 5 minutes.