Computer Use Test — Run Results

Copy this file to RESULTS-<SCOPE>-<RUNID>.md (e.g. RESULTS-full-20260601a.md or RESULTS-CU-01-20260601a.md) and fill it in. This is the artifact Ravi reviews after a run: it must make clear what was tested, what passed, what failed, and what is left to test by hand.

Run header

Field	Value
RUNID	`<RUNID>`
Date / time (UTC)	`<YYYY-MM-DD HH:MM>`
Environment	`https://calendo.dev` (production)
Agent / harness	`<which computer-use agent / model>`
Scope	`<single suite ID OR "full graph (Wave 1 + Wave 2)">`
Preconditions (§B checklist)	☐ All passed ☐ Failures (list below)
Overall result	`<PASS / PASS-with-residue / FAIL / BLOCKED>`

Precondition failures (if any): <none / list each failed item from 00-setup §B and what was skipped because of it>

Per-suite results

Status legend: PASS (all pass/fail criteria met) · PARTIAL (some items pass, some blocked/failed) · FAIL (a pass/fail criterion failed) · BLOCKED (precondition/session missing) · SKIPPED (not attempted this run) · N/R not run.

Fill L1 / L2 / L3 with ✅ / ❌ / — (n/a) / 🚫 (blocked). Link evidence to screenshot names.

ID	Title	Pri	Status	L3
CU-01	Core booking lifecycle (book → reschedule → cancel)	P0	N/R
CU-02	Auth lifecycle (register, verify, login, reset, delete)	P0	N/R
CU-03	Google Calendar (conflict, buffers, two-way)	P0	N/R
CU-05	Event-type config → booking-page enforcement	P0	N/R
CU-06	Availability engine (weekly, overrides, holidays, slot-debug)	P0	N/R
CU-07	Host-side booking management	P1	N/R
CU-08	AI booking chatbot (public page)	P1	N/R
CU-09	AI dashboard assistant (feature parity)	P1	N/R
CU-10	Landing + marketing + static pages + mobile	P1	N/R	—
CU-11	Public booking page UX (timezones, nav, QR, mobile)	P1	N/R
CU-04	Microsoft / Outlook calendar integration	P2	N/R
CU-12	Routing forms (build → submit → route → analytics)	P2	N/R	—
CU-13	Meeting polls (create → vote → tally → finalize)	P2	N/R	—
CU-14	Team / org scheduling (roles, round-robin, collective)	P2	N/R
CU-15	Contacts, analytics dashboard, CSV export	P2	N/R	—
CU-16	Settings & customization (branding, blocklist, BYOK, pixels)	P2	N/R	—
CU-17	Slack notifications & outbound webhooks	P2	N/R
CU-18	New-user onboarding wizard (4-step)	P2	N/R	—
CU-19	Embeddable booking widget (inline/popup/badge)	P3	N/R
CU-20	Email sequences, reminders, reconfirmation (time-gated)	P3	N/R
CU-22	Chrome extension for Gmail (manual-led)	P3	N/R	—

Tally: PASS __ · PARTIAL __ · FAIL __ · BLOCKED __ · SKIPPED __ (of 21)

Per-suite detail

Duplicate this block for each suite attempted. Keep failures specific and reproducible.

CU-__ — <title>

Status: <PASS/PARTIAL/FAIL/BLOCKED/SKIPPED>
RUNID artifacts created: <event-type names, invitee emails, forms/polls, etc.>
Pass/Fail criteria: <which criteria from the runbook passed; quote any that failed>
Steps passed: <e.g. 1–28> Steps failed/blocked: <step #s + what happened>
L1 (UI): <observed>
L2 (persistence): <observed after reload / API>
L3 (external reality):
- Email: search query used <inv-RUNID>; subject observed <...>; inbox(es) checked <...>
- Calendar: event date/time <...>; title <...>; attendee present <y/n>; state <created/moved/deleted>
- Webhook/other: <webhook.site request body / Slack message / etc.>
Screenshots: <names captured>
Deviations / transient retries: <any reloads/waits needed, 429/529 overloads, propagation lag>
Cleanup done: <bookings cancelled, event types deleted, calendar test events removed, account deleted>

Manual residue — REMAINING FOR RAVI TO TEST

Things the agent could not fully verify in-browser this run. (Pre-filled from the suites' manual-residue sections; the runner ticks/annotates what actually fell through.)

☐ Payments end-to-end — Stripe is live-mode (no test card); paid booking, refunds, Pro upgrade are untested. (CU-05/CU-16/out-of-scope)
☐ Pro-gated features — SMS/Twilio reminders (also need a real phone), custom domains, remove-branding. (out-of-scope)
☐ Reminder / reconfirmation timing — 24h/1h/72h cron sends are time-gated; only the immediate/near-term path is checkable. (CU-20)
☐ True separate-invitee RSVP — plus-aliases land in the host inbox, so genuinely-external invitee calendar-invite delivery is ambiguous except where P3 is used. (CU-01/CU-05/CU-07)
☐ Real Slack channel render — if no real Slack workspace, only the outbound POST (via webhook.site) is verified, not the in-Slack message. (CU-17)
☐ Webhook HMAC signature recomputation — receipt + header presence verified; cryptographic signature re-validation is manual. (CU-17)
☐ Chrome extension in-browser — loading the unpacked MV3 extension + Gmail compose insertion is manual. (CU-22)
☐ OAuth cold consent flows — connecting a calendar from scratch (Google/Microsoft consent) is human-only; the agent assumes pre-connected. (CU-03/CU-04)
☐ Deep cross-timezone correctness & ICS parsing — spot-checked, not exhaustively audited. (CU-11/CU-01)
☐ <add any new residue the agent hit this run>

Out of scope / TBD this run

Stripe / PayPal / coupons / Pro-upgrade flow (no test mode).
Admin panel (gated to a different hardcoded email — not in the dedicated test accounts).
<anything else deliberately skipped>

See ../COVERAGE.md for the full per-suite manual-residue and TBD ledger.

Cleanup confirmation

☐ All RUNID-scoped event types deleted.
☐ All RUNID-scoped bookings cancelled (and no stray calendar events left behind — verified in Google/Outlook Calendar).
☐ All RUNID-scoped routing forms / polls / webhooks / orgs removed or noted as harmless residue.
☐ Throwaway accounts (CU-02 +auth, CU-18 +onb) deleted via Settings → Danger Zone.
☐ CU-06 baseline restored — global Mon–Fri 09:00–17:00 availability put back exactly.
☐ Test busy events created in Google/Outlook Calendar (for conflict tests) deleted.
☐ No session left in a destructive/half-changed state.

Cleanup notes: <anything left intentionally, or that needs manual/D1 cleanup>