Computer Use Test — Run Results
Copy this file to
RESULTS-<SCOPE>-<RUNID>.md(e.g.RESULTS-full-20260601a.mdorRESULTS-CU-01-20260601a.md) and fill it in. This is the artifact Ravi reviews after a run: it must make clear what was tested, what passed, what failed, and what is left to test by hand.
Run header
| Field | Value |
|---|---|
| RUNID | <RUNID> |
| Date / time (UTC) | <YYYY-MM-DD HH:MM> |
| Environment | https://calendo.dev (production) |
| Agent / harness | <which computer-use agent / model> |
| Scope | <single suite ID OR "full graph (Wave 1 + Wave 2)"> |
| Preconditions (§B checklist) | ☐ All passed ☐ Failures (list below) |
| Overall result | <PASS / PASS-with-residue / FAIL / BLOCKED> |
Precondition failures (if any): <none / list each failed item from 00-setup §B and what was skipped because of it>
Per-suite results
Status legend: PASS (all pass/fail criteria met) · PARTIAL (some items pass, some blocked/failed) · FAIL (a pass/fail criterion failed) · BLOCKED (precondition/session missing) · SKIPPED (not attempted this run) · N/R not run.
Fill L1 / L2 / L3 with ✅ / ❌ / — (n/a) / 🚫 (blocked). Link evidence to screenshot names.
| ID | Title | Pri | Status | L1 | L2 | L3 | Evidence | Notes |
|---|---|---|---|---|---|---|---|---|
| CU-01 | Core booking lifecycle (book → reschedule → cancel) | P0 | N/R | |||||
| CU-02 | Auth lifecycle (register, verify, login, reset, delete) | P0 | N/R | |||||
| CU-03 | Google Calendar (conflict, buffers, two-way) | P0 | N/R | |||||
| CU-05 | Event-type config → booking-page enforcement | P0 | N/R | |||||
| CU-06 | Availability engine (weekly, overrides, holidays, slot-debug) | P0 | N/R | |||||
| CU-07 | Host-side booking management | P1 | N/R | |||||
| CU-08 | AI booking chatbot (public page) | P1 | N/R | |||||
| CU-09 | AI dashboard assistant (feature parity) | P1 | N/R | |||||
| CU-10 | Landing + marketing + static pages + mobile | P1 | N/R | — | ||||
| CU-11 | Public booking page UX (timezones, nav, QR, mobile) | P1 | N/R | |||||
| CU-04 | Microsoft / Outlook calendar integration | P2 | N/R | |||||
| CU-12 | Routing forms (build → submit → route → analytics) | P2 | N/R | — | ||||
| CU-13 | Meeting polls (create → vote → tally → finalize) | P2 | N/R | — | ||||
| CU-14 | Team / org scheduling (roles, round-robin, collective) | P2 | N/R | |||||
| CU-15 | Contacts, analytics dashboard, CSV export | P2 | N/R | — | ||||
| CU-16 | Settings & customization (branding, blocklist, BYOK, pixels) | P2 | N/R | — | ||||
| CU-17 | Slack notifications & outbound webhooks | P2 | N/R | |||||
| CU-18 | New-user onboarding wizard (4-step) | P2 | N/R | — | ||||
| CU-19 | Embeddable booking widget (inline/popup/badge) | P3 | N/R | |||||
| CU-20 | Email sequences, reminders, reconfirmation (time-gated) | P3 | N/R | |||||
| CU-22 | Chrome extension for Gmail (manual-led) | P3 | N/R | — |
Tally: PASS __ · PARTIAL __ · FAIL __ · BLOCKED __ · SKIPPED __ (of 21)
Per-suite detail
Duplicate this block for each suite attempted. Keep failures specific and reproducible.
CU-__ — <title>
- Status:
<PASS/PARTIAL/FAIL/BLOCKED/SKIPPED> - RUNID artifacts created:
<event-type names, invitee emails, forms/polls, etc.> - Pass/Fail criteria:
<which criteria from the runbook passed; quote any that failed> - Steps passed:
<e.g. 1–28>Steps failed/blocked:<step #s + what happened> - L1 (UI):
<observed> - L2 (persistence):
<observed after reload / API> - L3 (external reality):
- Email: search query used
<inv-RUNID>; subject observed<...>; inbox(es) checked<...> - Calendar: event date/time
<...>; title<...>; attendee present<y/n>; state<created/moved/deleted> - Webhook/other:
<webhook.site request body / Slack message / etc.>
- Email: search query used
- Screenshots:
<names captured> - Deviations / transient retries:
<any reloads/waits needed, 429/529 overloads, propagation lag> - Cleanup done:
<bookings cancelled, event types deleted, calendar test events removed, account deleted>
Manual residue — REMAINING FOR RAVI TO TEST
Things the agent could not fully verify in-browser this run. (Pre-filled from the suites' manual-residue sections; the runner ticks/annotates what actually fell through.)
- ☐ Payments end-to-end — Stripe is live-mode (no test card); paid booking, refunds, Pro upgrade are untested. (CU-05/CU-16/out-of-scope)
- ☐ Pro-gated features — SMS/Twilio reminders (also need a real phone), custom domains, remove-branding. (out-of-scope)
- ☐ Reminder / reconfirmation timing — 24h/1h/72h cron sends are time-gated; only the immediate/near-term path is checkable. (CU-20)
- ☐ True separate-invitee RSVP — plus-aliases land in the host inbox, so genuinely-external invitee calendar-invite delivery is ambiguous except where P3 is used. (CU-01/CU-05/CU-07)
- ☐ Real Slack channel render — if no real Slack workspace, only the outbound POST (via webhook.site) is verified, not the in-Slack message. (CU-17)
- ☐ Webhook HMAC signature recomputation — receipt + header presence verified; cryptographic signature re-validation is manual. (CU-17)
- ☐ Chrome extension in-browser — loading the unpacked MV3 extension + Gmail compose insertion is manual. (CU-22)
- ☐ OAuth cold consent flows — connecting a calendar from scratch (Google/Microsoft consent) is human-only; the agent assumes pre-connected. (CU-03/CU-04)
- ☐ Deep cross-timezone correctness & ICS parsing — spot-checked, not exhaustively audited. (CU-11/CU-01)
- ☐
<add any new residue the agent hit this run>
Out of scope / TBD this run
- Stripe / PayPal / coupons / Pro-upgrade flow (no test mode).
- Admin panel (gated to a different hardcoded email — not in the dedicated test accounts).
<anything else deliberately skipped>
See ../COVERAGE.md for the full per-suite manual-residue and TBD ledger.
Cleanup confirmation
- ☐ All RUNID-scoped event types deleted.
- ☐ All RUNID-scoped bookings cancelled (and no stray calendar events left behind — verified in Google/Outlook Calendar).
- ☐ All RUNID-scoped routing forms / polls / webhooks / orgs removed or noted as harmless residue.
- ☐ Throwaway accounts (CU-02
+auth, CU-18+onb) deleted via Settings → Danger Zone. - ☐ CU-06 baseline restored — global Mon–Fri 09:00–17:00 availability put back exactly.
- ☐ Test busy events created in Google/Outlook Calendar (for conflict tests) deleted.
- ☐ No session left in a destructive/half-changed state.
Cleanup notes: <anything left intentionally, or that needs manual/D1 cleanup>