Calendo — Computer Use Tests

Browser-agent end-to-end suites run against production https://calendo.dev. The reality gate: a real booking must produce a real calendar event and a real email, confirmed by an agent opening Gmail/Calendar in the browser.

L1 UI confirmation L2 Calendo persistence (survives reload) L3 external reality (real GCal/Outlook event + real email)
Lane A read-only · anytime Lane B host-writer · RUNID-isolated · parallel Lane D account-isolated · parallel Lane X exclusive · runs alone
21
suites
5
P0 critical
10
with L3 reality
1
exclusive (CU-06)
~11.1h
single-threaded

Review surface only. Full operating manual: README.md · setup: 00-setup-preconditions.md · coverage ledger: COVERAGE.md · results: RESULTS-TEMPLATE.md.

Suite catalog — ranked by importance

Sorted by priority (P0 → P3) then ID. The high-traffic paths most users hit (CU-01, CU-05, CU-08, CU-11, CU-06, CU-02, CU-03) are the deepest runbooks. "Status" defaults to Not run — fill it from each run's results file.

IDTitlePriAccountsLaneExcl.Est.L3Status
CU-01Core booking lifecycle (book → reschedule → cancel)P0P1 host + anonB30mYesNot run
CU-02Auth lifecycle (register, verify, login, reset, delete)P0fresh +authD22mYesNot run
CU-03Google Calendar — conflict, buffers, two-way syncP0P1 + GCalD40mYesNot run
CU-05Event-type config → booking-page enforcementP0P1 hostB85mYesNot run
CU-06Availability engine — weekly, overrides, holidays, slot-debugP0P1 hostXYES35mNot run
CU-07Host-side booking management (no-show, notes, guests, on-behalf)P1P1 hostB40mYesNot run
CU-08AI booking chatbot (the differentiator)P1anonB30mYesNot run
CU-09AI dashboard assistant (feature parity)P1P1 hostB45mPartialNot run
CU-10Landing + marketing + static pages + mobileP1none (anon)A15mNot run
CU-11Public booking UX — timezones, nav, empty-state, QR, mobileP1anonA/B22mNot run
CU-04Microsoft / Outlook calendar integrationP2Outlook acctD35mYesNot run
CU-12Routing forms — build → submit → route → analyticsP2P1 + anonB18mNot run
CU-13Meeting polls — create → vote → tally → finalizeP2P1 + anonB18mNot run
CU-14Team / org scheduling — roles, round-robin, collectiveP2P1 + P3D40mYesNot run
CU-15Contacts, analytics dashboard, CSV exportP2P1 hostA25mNot run
CU-16Settings & customization — branding, blocklist, BYOK, pixelsP2P1 hostB30mNot run
CU-17Slack notifications & outbound webhooksP2P1 hostB30mYesNot run
CU-18New-user onboarding wizard (4-step)P2fresh +onbD22mNot run
CU-19Embeddable booking widget (inline/popup/badge)P3P1 + ext pageB25mNot run
CU-20Email sequences, reminders, reconfirmation (time-gated)P3P1 hostB40mPartialNot run
CU-22Chrome extension for Gmail (manual-led)P3P1 (manual)manual20mNot run

Parallelization plan

Each suite is internally sequential. Across suites, run Wave 1 fully, then Wave 2 alone.

Wave 1 — run concurrently

CU-01CU-02CU-03CU-04 CU-05CU-07CU-08CU-09 CU-10CU-11CU-12CU-13 CU-14CU-15CU-16CU-17 CU-18CU-19CU-20
All Lane A / B / D suites. Safe in parallel: read-only, RUNID-isolated event types, or different accounts. (CU-22 runs whenever a human is available.)

Wave 2 — run alone

CU-06
The availability engine rewrites P1's global weekly schedule, which every booking suite's slot math depends on. It runs after Wave 1 fully drains, captures a baseline, and restores it at the end.

Coverage map

What area each suite owns. Full capability-by-tier traceability is in COVERAGE.md.

Booking & lifecycle

CU-01CU-11CU-07

Calendar reality (L3)

CU-03CU-04

Availability engine

CU-06

Event-type configuration

CU-05

AI (the differentiator)

CU-08CU-09

Auth & onboarding

CU-02CU-18

Public & marketing

CU-10CU-11

Teams & org

CU-14

Routing & polls

CU-12CU-13

Analytics & contacts

CU-15

Settings & customization

CU-16

Integrations (Slack/webhooks)

CU-17

Embed widget

CU-19

Email & reminders

CU-20

Browser extension

CU-22

Manual residue — test yourself

Things a browser agent cannot fully verify. After a run these roll up into the results file; this is the standing list to keep in mind.

Out of scope / TBD

These are deliberately catalogued, not silently dropped. See COVERAGE.md §2 for the complete ledger.