Computer-use threat model
Companion to docs/guides/computer-use.md. This document captures the assets, attack surfaces, mitigations, and the penetration-test runbook for the computer_use server tool.
Assets
- End-user identity and credentials that the agent may be asked to enter on a third-party site.
- Host application state (cookies, local storage, the DOM of the chat host page).
- Vendor session integrity — the hosted Chromium that runs the actual browsing.
- Audit trail — must be tamper-evident so a post-incident review can reconstruct what the agent did.
- The kill switch — must be reachable even when the rest of the system is degraded.
Trust boundaries
| Boundary | Direction | Notes |
|---|---|---|
| Browser → customer proxy | outbound | TLS + chat token (short-lived). Idempotency-Key required. |
| Customer proxy → vendor | outbound | Vendor API key, scoped to the configured project. Never reaches the browser. |
| Vendor Chromium → arbitrary internet | outbound | Restricted by vendor allowlist + proxy's pre-flight allowlist check. |
| Vendor screenshots → proxy → browser | inbound | Re-hosted as signed URLs; never directly served by the vendor to the browser. |
| Iframe → host page | inbound | Blocked by sandbox="" (no scripts, no same-origin, no top-nav). |
Attack surfaces & mitigations
A1. Agent navigates to an off-allowlist host
- Risk: Data exfiltration, phishing, privilege escalation on a target unrelated to the user's goal.
- Mitigation: The proxy's
ComputerUseSession.dispatchre-checks every URL againstpolicy.allowlistbefore forwarding to the provider. The SDK'sassertComputerUseAllowedfast-fails before the HTTP round-trip. Browserbase additionally honors thebrowserSettings.allowedUrlswe pass. Three layers of defense. - Test:
packages/gecx-chat/test/computerUseGovernance.test.tsincludes a fuzz pass againstjavascript:,data:,file:, IDN homoglyphs, IPv6, and path-traversal-style inputs. The proxy-sideapps/proxy-reference/test/computerUseSession.test.tsconfirms an off-allowlist URL throwsCOMPUTER_USE_ALLOWLIST_VIOLATIONand audits the violation.
A2. Compromised screenshot payload reaches the host page
- Risk: A malicious page coerces the vendor into serving HTML that, if rendered in the host page, could exfiltrate auth cookies.
- Mitigation:
<ComputerUseSurface>renders the screenshot inside an iframe whosesandboxattribute is the empty string — no scripts, no same-origin, no top-navigation, no popups. The iframesrcDocreferences only an<img>with the proxy-supplied URL; HTML special characters in the URL are entity-escaped. The action log is plain text built from the structured event stream, not from remote HTML. - Test:
packages/gecx-chat/test/react/ComputerUseSurface.test.tsasserts the iframe is emitted withsandbox="".
A3. Stream URL leak
- Risk: A leaked
streamUrl(in browser history, logs, etc.) lets an attacker reconnect to a live or past session and replay screenshots. - Mitigation: The proxy signs every
streamUrlwith HMAC and a 60-second TTL. Tokens are verified before any session lookup so callers cannot probe for valid session ids by toggling tokens. The verification uses a constant-time compare. - Test:
apps/proxy-reference/test/computerUseSession.test.tsasserts an invalid token returns 401 even when the session id is valid.
A4. Action-loop exhaustion / resource exhaustion
- Risk: An agent that loops indefinitely could exhaust action budgets and pin a vendor session at $$$ cost.
- Mitigation:
maxActionsPerSession(default 30) andmaxDurationMs(default 5 minutes) are hard caps enforced byComputerUseSession. A duration timer firesterminate('duration_exceeded', ...)from asetTimeout. Crossing the action cap raisesComputerUseActionLimitExceededand tears the session down. - Test:
apps/proxy-reference/test/computerUseSession.test.tsexercises the action-limit path.
A5. Kill-switch failure under load
- Risk: Operator flips the kill switch during an incident but in-flight sessions don't tear down.
- Mitigation:
setKillSwitch(true)callssession.kill(reason)on every active session synchronously. Every session has anAbortControllerwhose signal is wired into the provider, so any in-flightact()aborts. The kill switch also blocks all new sessions withCOMPUTER_USE_PROVIDER_UNAVAILABLE. - Test:
apps/proxy-reference/test/computerUseSession.test.ts"stops the in-flight session when the kill switch flips".
A6. Audit-tampering or audit-loss
- Risk: An audit sink that crashes or backpressures could silently drop events, undermining post-incident review.
- Mitigation:
ChatGovernance.emitwraps sink invocations in try/catch — a failing sink cannot break the SDK, but the SDK will not stop calling it. Hosts should plumb the sink into a durable queue (Pub/Sub, Kinesis, Kafka). The event includesrequestId,computerUseSessionId, anddetails.actionIdfor full correlation. - Test:
packages/gecx-chat/test/computerUseGovernance.test.tsasserts the audit sink fires for allowlist violations, action emits, and kill-switch activations.
A7. Vendor account compromise
- Risk: Stolen Browserbase API key would let an attacker create vendor sessions on our account.
- Mitigation: The key never reaches the browser bundle (server-only
gecx-chat/serversubpath). The proxy validates env at boot. Rotate per deployment. Vendor sessions are short-lived; even a stolen key gives an attacker minutes per session, not hours.
A8. Replay of completed tool calls
- Risk: An attacker re-submits a completed
computer_usetool call to redo the actions on the user's behalf. - Mitigation:
defineServerToolrequiresIdempotency-Key; replays return the cached result rather than dispatching a new session. The TTL is 60 minutes; after that the request is treated as new but still subject to allowlist + consent.
Threat-model-aware deployment checklist
-
governance.computerUse.allowlistis set and reviewed. - Proxy env
COMPUTER_USE_ALLOWLISTmatches the SDK config. -
COMPUTER_USE_STREAM_KEYis set to a value rotated per deployment. - Browserbase API key + project id are stored in a secret manager, not on disk.
- Audit sink writes to a durable destination (Cloud Logging, SIEM).
- Kill switch is reachable via at least one out-of-band channel (admin REST endpoint, env reload).
- An end-to-end e2e test runs against the mock provider in CI.
- A monthly drill verifies the kill switch tears down a live session within 1 second.
Penetration-test runbook
-
Allowlist fuzz. Issue
computer_usecalls with URLs from the fuzz set:javascript:,data:,file:,ftp:, IDN homoglyphs (еxample.com), bare hostnames with trailing whitespace, mixed-case schemes, double-encoded paths, IPv6 literals, link-local addresses, fragment-only URLs. Every input must produceCOMPUTER_USE_ALLOWLIST_VIOLATIONand emitgovernance.computer_use.allowlist_violation. -
Stream-token replay. Capture a
streamUrlfrom a live session, wait for the session to end, then replay the URL. Expect 401 /COMPUTER_USE_TOKEN_INVALID. -
Iframe escape. Inject a screenshot URL that is in fact an
<html>payload with inline scripts. Confirm the iframe'ssandbox=""policy prevents execution by inspecting the host page's console for any injected events. -
Kill-switch under load. Spin up 50 concurrent mock sessions, then
POST /admin/computer-use/kill-switch { "on": true }. Within 1 second every session must emitgovernance.computer_use.killedandgovernance.computer_use.session_endedwithstatus: 'aborted'. No orphaned Browserbase sessions remain (verify via the vendor dashboard). -
Action-budget loop. Use the deterministic mock provider and a planner that always returns the same action. Confirm the session terminates with
action_limit_exceededaftermaxActionsPerSessioniterations, with one audit event per action plus a terminalsession_ended. -
Audit-sink failure mode. Wire an audit sink that throws synchronously. Confirm the SDK and proxy continue to function; no exceptions bubble out; subsequent sink invocations are still attempted.
-
Bundle inspection. Run
pnpm build:packages && grep -r "BROWSERBASE_API_KEY\|playwright\|node:http" packages/gecx-chat/dist/index.js. Expect zero hits — server-only modules must not appear in the browser bundle.
Reporting
If you discover a vulnerability in the computer-use subsystem, follow SECURITY.md. Tag the report with computer_use so it is routed to the matching owner.
docs/reference/computer-use-threat-model.md