LLM vs Truth Engine: 5 RRULE Expansions Side-by-Side
LLMs are confident about recurring calendar events. They’re also frequently wrong.
We took 5 real-world RRULE strings — the kind that appear in production Google Calendar and Outlook accounts — and ran them through GPT-4o, Claude Sonnet 4, and the Truth Engine. The results speak for themselves.
What is RRULE?
RRULE is the recurrence rule format defined in RFC 5545 (the iCalendar spec). It encodes patterns like “every Monday at 9am” or “last Friday of the month” as machine-readable strings. Calendar providers store recurring events this way, and any tool that works with calendars must expand them into concrete dates.
The expansion is a deterministic algorithm — given the same rule and start date, there is exactly one correct answer. This makes it a perfect benchmark: we can verify every output against the spec.
The 5 tests
Test 1: Weekly standup across DST spring-forward
Rule: FREQ=WEEKLY;BYDAY=MO starting March 2, 2026 at 9:00 AM Eastern, 4 occurrences.
March 8, 2026 is spring-forward day in the US. Clocks jump from 2:00 AM to 3:00 AM. A 9am meeting stays at 9am wall-clock time, but its UTC representation changes.
What LLMs return: Most models produce correct dates but use the pre-DST UTC offset (-05:00) for all four occurrences. The March 9 and March 16 occurrences should use -04:00 (EDT), not -05:00 (EST).
What Truth Engine returns:
2026-03-02T14:00:00Z (Mon Mar 2, 9:00 AM EST, UTC-5)
2026-03-09T13:00:00Z (Mon Mar 9, 9:00 AM EDT, UTC-4) ← offset changed
2026-03-16T13:00:00Z (Mon Mar 16, 9:00 AM EDT, UTC-4)
2026-03-23T13:00:00Z (Mon Mar 23, 9:00 AM EDT, UTC-4)
Why it matters: An agent scheduling a cross-timezone meeting using the wrong UTC offset books it an hour early or late. The attendee in London sees 1pm instead of 2pm on their calendar.
Test 2: Last weekday of the month (BYSETPOS=-1)
Rule: FREQ=MONTHLY;BYDAY=MO,TU,WE,TH,FR;BYSETPOS=-1 starting January 1, 2026, 6 occurrences.
This means “the last weekday of each month.” January 2026 ends on a Saturday, so the last weekday is Friday the 30th. February ends on a Saturday, so it’s Friday the 27th. And so on — the answer depends on which day of the week the month ends on.
What LLMs return: Models frequently return the last day of the month regardless of whether it’s a weekday, or they return the last Friday specifically (ignoring that “last weekday” could be any Mon-Fri). Some models return the first weekday instead.
What Truth Engine returns:
2026-01-30 (Friday) ← last weekday of January
2026-02-27 (Friday) ← last weekday of February
2026-03-31 (Tuesday) ← last weekday of March
2026-04-30 (Thursday) ← last weekday of April
2026-05-29 (Friday) ← last weekday of May
2026-06-30 (Tuesday) ← last weekday of June
Why it matters: “Last weekday of the month” is a common payroll and reporting pattern. Getting it wrong means a financial report runs on a weekend (when systems may be in maintenance) or a payroll cycle is a day late.
Test 3: Leap year recurrence (Feb 29)
Rule: FREQ=YEARLY;BYMONTH=2;BYMONTHDAY=29 starting January 1, 2024, 4 occurrences.
This event only happens in leap years: 2024, 2028, 2032, 2036.
What LLMs return: Models often “helpfully” generate February 28 or March 1 in non-leap years (2025, 2026, 2027). They treat the missing date as something to approximate rather than skip.
What Truth Engine returns:
2024-02-29
2028-02-29
2032-02-29
2036-02-29
No substitutions. No approximations. If February 29 doesn’t exist that year, no instance is generated.
Why it matters: A birthday reminder or compliance deadline on Feb 29 that fires on Feb 28 in non-leap years creates phantom events — 3 out of every 4 years.
Test 4: Biweekly on multiple days (INTERVAL with BYDAY)
Rule: FREQ=WEEKLY;INTERVAL=2;BYDAY=TU,TH starting March 3, 2026 (a Tuesday), 8 occurrences.
INTERVAL=2 means every other week. On alternating weeks, the event occurs on both Tuesday and Thursday. This creates a pattern: week 1 gets Tue+Thu, week 2 is empty, week 3 gets Tue+Thu, and so on.
What LLMs return: Most models generate occurrences every week instead of every other week. The INTERVAL=2 modifier is applied per-day rather than per-week, or ignored entirely.
What Truth Engine returns:
2026-03-03 (Tue, week 1)
2026-03-05 (Thu, week 1)
2026-03-17 (Tue, week 3) ← week 2 skipped
2026-03-19 (Thu, week 3)
2026-03-31 (Tue, week 5) ← week 4 skipped
2026-04-02 (Thu, week 5)
2026-04-14 (Tue, week 7) ← week 6 skipped
2026-04-16 (Thu, week 7)
Why it matters: Biweekly meetings are one of the most common patterns in corporate calendars. Every-week instead of every-other-week doubles the meeting load — and if an agent books follow-ups based on the wrong schedule, it creates cascading conflicts.
Test 5: Second Friday with exclusion (BYSETPOS + EXDATE)
Rule: FREQ=MONTHLY;BYDAY=FR;BYSETPOS=2 with EXDATE:20260710T140000Z starting January 1, 2026 at 10:00 AM Eastern, 8 occurrences.
The second Friday of each month, except July 10, 2026 (which is indeed the second Friday of July). The EXDATE removes that one occurrence.
What LLMs return: Two common failures: (1) the model ignores the EXDATE entirely and includes July 10, or (2) it applies the EXDATE but still generates 8 occurrences by adding an extra month — producing 9 total dates with one removed, instead of 7 dates from 8 months with one excluded.
What Truth Engine returns:
2026-01-09 (2nd Friday of January)
2026-02-13 (2nd Friday of February)
2026-03-13 (2nd Friday of March)
2026-04-10 (2nd Friday of April)
2026-05-08 (2nd Friday of May)
2026-06-12 (2nd Friday of June)
← July 10 excluded by EXDATE
2026-08-14 (2nd Friday of August)
7 instances returned. The July occurrence is correctly excluded, and the expansion continues into August to produce the remaining count.
Why it matters: EXDATEs model real-world exceptions — holidays, company offsites, cancelled sessions. An agent that ignores them suggests times that are explicitly blocked, or shows cancelled meetings as active.
The pattern
In every test, the failure mode is the same: LLMs treat RRULE expansion as a text prediction task. They generate dates that look plausible given the pattern. The Truth Engine treats it as what it actually is — a deterministic algorithm defined by RFC 5545.
This isn’t a criticism of LLMs. They’re extraordinary at understanding intent, generating prose, and reasoning about ambiguous problems. RRULE expansion isn’t an ambiguous problem. It has exactly one right answer, and it should be computed, not predicted. If you’re deciding whether to build this computation layer yourself or embed an existing one, see Build vs Embed: The AI Scheduling Build-or-Buy Decision.
Try it yourself
The Truth Engine powers the expand_rrule tool in Temporal Cortex. Install the MCP server and test any RRULE:
npx @temporal-cortex/cortex-mcp
The Truth Engine is also available standalone as an open-source library on crates.io, npm, and PyPI — MIT/Apache-2.0 dual licensed, 9,000+ property-based tests.