Computer-use integration guide

This guide explains how to opt into the computer_use server tool — the SDK feature that lets an agent drive a sandboxed browser on the user's behalf with an explicit consent UX, allowlist enforcement, audit logging, and a kill switch.

The feature is default-off. Nothing in this guide takes effect until you populate governance.computerUse in your ChatClient config and configure a provider on your proxy. Until then, every computer_use invocation fails fast with COMPUTER_USE_NOT_ENABLED.

When to use this

  • A user asks the agent to perform a task on a site that is not exposed via your APIs (e.g. "look up my warranty status on the manufacturer's portal").
  • The task is bounded: one or two hostnames, a small number of clicks/forms, a clear success criterion.
  • The host has reviewed the threat model in docs/reference/computer-use-threat-model.md and accepted it.

When not to use this

  • Any flow where you can call an API directly. Computer-use is a last-resort capability.
  • Any task that would require visiting more than a handful of hostnames.
  • Any task that crosses payments, authentication, or PII boundaries without a separate human review step.

Architecture, at a glance

ChatClient (browser)
  ├── governance: { computerUse: { enabled, allowlist, ... } }
  └── tools: [ computerUseTool({ endpoint: '/chat/tool-call' }) ]
            │
            │  HTTPS POST /chat/tool-call (Idempotency-Key, chat token)
            ▼
Customer proxy
  ├── createComputerUseHandler({ provider, policy, audit, ... })
  │     ├── ComputerUseSession (per-call state machine + audit)
  │     └── ComputerUseProvider (Mock | Browserbase | ...)
  │              │
  │              │ vendor REST/CDP
  │              ▼
  │         Hosted Chromium (Browserbase) — the security boundary
  ├── GET /chat/computer-use/:sessionId/stream (signed SSE)
  └── POST /chat/computer-use/:sessionId/control (decisions + abort)

Three things to keep in mind:

  1. The iframe is a display sandbox, not a vendor sandbox. Browserbase runs Chromium in their isolated environment; our iframe shows PNG screenshots and a structured action log. No remote HTML/JS reaches the host page.
  2. The proxy is the policy enforcement point. The SDK-side check is fast-fail UX; the proxy re-applies every check independently. Never widen the allowlist at the SDK to "fix" an error.
  3. Approval is two-tier. The session-level approval reuses the existing ToolRegistry tool.awaiting_approval flow. The per-action high-risk approval (e.g. form submit) is a second channel inside the live session and surfaces via the surface's ConfirmDialog.

SDK config

import { createChatClient, computerUseTool } from 'gecx-chat';

const client = createChatClient({
  // ... your existing transport, auth, etc.
  governance: {
    computerUse: {
      enabled: true,
      allowlist: ['acme-orders.example.com'],
      maxDurationMs: 5 * 60_000,
      maxActionsPerSession: 30,
      highRiskActions: ['submit_form', 'download', 'navigate_external'],
      killSwitch: false,
    },
  },
  tools: [
    computerUseTool({
      endpoint: '/chat/tool-call',
      // Optional: hand the resolved policy to the tool so the generated
      // tool description tells the agent which hostnames are allowed.
      policy: {
        allowlist: ['acme-orders.example.com'],
        maxDurationMs: 5 * 60_000,
        maxActionsPerSession: 30,
      },
    }),
  ],
});

Allowlist matching

The allowlist matches hostnames exactly (case-insensitive, IDN-decoded). Subdomains are NOT implicit: acme.com does NOT match orders.acme.com. List each hostname explicitly. Schemes other than http: and https: are always rejected.

Default values

FieldDefault
enabledfalse
allowlist[]
maxDurationMs300_000 (5 minutes)
maxActionsPerSession30
highRiskActions['submit_form', 'download', 'navigate_external']
killSwitchfalse

Proxy config

Set these env vars on the proxy:

COMPUTER_USE_PROVIDER=mock | browserbase
COMPUTER_USE_ALLOWLIST=acme-orders.example.com,...
COMPUTER_USE_MAX_DURATION_MS=300000
COMPUTER_USE_MAX_ACTIONS=30
COMPUTER_USE_KILL_SWITCH=0
COMPUTER_USE_STREAM_KEY=<rotate per deployment>

# Only when COMPUTER_USE_PROVIDER=browserbase:
BROWSERBASE_API_KEY=...
BROWSERBASE_PROJECT_ID=...

The proxy's createComputerUseHandler is the authoritative enforcement point. Even if the SDK config is mis-set, the proxy refuses anything outside its own allowlist.

Wiring a host-supplied automation hook

BrowserbaseProvider does not pull in Playwright as a dependency — the host wires its preferred automation library via the act hook so we don't impose a platform-binary toolchain on every customer:

import { chromium } from 'playwright';
import { BrowserbaseProvider } from 'gecx-chat/server';

const provider = new BrowserbaseProvider({
  apiKey: process.env.BROWSERBASE_API_KEY!,
  projectId: process.env.BROWSERBASE_PROJECT_ID!,
  act: async ({ connectUrl, action, signal }) => {
    const browser = await chromium.connectOverCDP(connectUrl);
    try {
      const context = browser.contexts()[0]!;
      const page = context.pages()[0] ?? await context.newPage();
      switch (action.actionType) {
        case 'navigate':
          await page.goto(action.url!, { signal });
          break;
        case 'click':
          await page.click(action.target!);
          break;
        // ... etc.
      }
      return { summary: `executed ${action.actionType}`, currentUrl: page.url() };
    } finally {
      await browser.close();
    }
  },
  screenshot: async ({ connectUrl }) => {
    const browser = await chromium.connectOverCDP(connectUrl);
    try {
      const page = browser.contexts()[0]!.pages()[0]!;
      const buffer = await page.screenshot({ type: 'png' });
      const url = `data:image/png;base64,${buffer.toString('base64')}`;
      return { url };
    } finally {
      await browser.close();
    }
  },
});

In production, host the screenshot bytes from a signed proxy URL rather than embedding base64 — large PNGs in data: URLs balloon the SSE frames and break some intermediaries.

Mount <ComputerUseSurface> in your transcript or as a sibling of your <ChatPanel> when a computer_use tool call enters awaiting_approval or executing:

import { ComputerUseSurface } from 'gecx-chat/react';

<ComputerUseSurface
  computerUseSessionId={session.computerUseSessionId}
  goal={session.goal}
  allowlist={session.allowlist}
  streamUrl={session.streamUrl}
  controlUrl={session.controlUrl}
  maxDurationMs={300_000}
  maxActionsPerSession={30}
  onEvent={(event) => yourAuditSink(event)}
  onComplete={(result) => console.log('done', result)}
/>

You can pass renderConsent and renderApproval overrides to swap in your design system.

Audit and kill switch

Every step emits a governance.computer_use.* event through ChatGovernance's audit pipeline. Wire it to your SIEM the same way you wire voice_session_started:

governance: {
  audit: (event) => yourSiemClient.send(event),
  computerUse: { /* ... */ },
}

To engage the kill switch:

  • SDK side (instant, in-process): chatClient.governance.triggerComputerUseKill(true, { reason: 'incident-2026-03-14' })
  • Proxy side (instant, server-wide): POST /admin/computer-use/kill-switch { "on": true, "reason": "..." }
  • Config: set COMPUTER_USE_KILL_SWITCH=1 and restart, or flip governance.computerUse.killSwitch in the SDK config (effective at next session boot).

While the kill switch is on, every active session emits governance.computer_use.killed and tears down, and every new computer_use call fails fast with COMPUTER_USE_PROVIDER_UNAVAILABLE.

Error codes

See docs/reference/error-codes.md#computer-use for the full list. Each code has a user message, developer hint, and remediation step.

Verification runbook

  1. pnpm install && pnpm typecheck && pnpm test — fast inner-loop.
  2. COMPUTER_USE_PROVIDER=mock pnpm dev — visit /computer-use in the showcase to drive the deterministic demo.
  3. BROWSERBASE_API_KEY=... BROWSERBASE_PROJECT_ID=... COMPUTER_USE_PROVIDER=browserbase pnpm dev — smoke against a real vendor session.
  4. pnpm e2e / pnpm e2e:applied — Playwright covers consent → screenshot stream → high-risk approval → completion.

Threat model and penetration test

See docs/reference/computer-use-threat-model.md.

Source: docs/guides/computer-use.md