Tools

Browser control API

For setup, configuration, and troubleshooting, see Browser. This page is the reference for the local control HTTP API, the OmeniaClaw browser CLI, and scripting patterns (snapshots, refs, waits, debug flows).

Control API (optional)

For local integrations only, the Gateway exposes a small loopback HTTP API. This standalone server is opt-in — set the environment variable OmeniaClaw_EAGER_BROWSER_CONTROL_SERVER=1 in the gateway service environment and restart the gateway before the HTTP endpoints become available. Without this variable the browser control runtime still works through the CLI and agent tools, but nothing listens on the loopback control port.

  • Status/start/stop: GET /, POST /start, POST /stop
  • Tabs: GET /tabs, POST /tabs/open, POST /tabs/focus, DELETE /tabs/:targetId
  • Snapshot/screenshot: GET /snapshot, POST /screenshot
  • Actions: POST /navigate, POST /act
  • Hooks: POST /hooks/file-chooser, POST /hooks/dialog
  • Downloads: POST /download, POST /wait/download
  • Permissions: POST /permissions/grant
  • Debugging: GET /console, POST /pdf
  • Debugging: GET /errors, GET /requests, POST /trace/start, POST /trace/stop, POST /highlight
  • Network: POST /response/body
  • State: GET /cookies, POST /cookies/set, POST /cookies/clear
  • State: GET /storage/:kind, POST /storage/:kind/set, POST /storage/:kind/clear
  • Settings: POST /set/offline, POST /set/headers, POST /set/credentials, POST /set/geolocation, POST /set/media, POST /set/timezone, POST /set/locale, POST /set/device

All endpoints accept ?profile=<name>. POST /start?headless=true requests a one-shot headless launch for local managed profiles without changing persisted browser config; attach-only, remote CDP, and existing-session profiles reject that override because OmeniaClaw does not launch those browser processes.

For tab endpoints, targetId is the compatibility field name. Prefer passing suggestedTargetId from GET /tabs or POST /tabs/open; labels and tabId handles such as t1 are also accepted. Raw CDP target ids and unique raw target-id prefixes still work, but they are volatile diagnostic handles.

If shared-secret gateway auth is configured, browser HTTP routes require auth too:

  • Authorization: Bearer <gateway token>
  • x-OmeniaClaw-password: <gateway password> or HTTP Basic auth with that password

Notes:

  • This standalone loopback browser API does not consume trusted-proxy or Tailscale Serve identity headers.
  • If gateway.auth.mode is none or trusted-proxy, these loopback browser routes do not inherit those identity-bearing modes; keep them loopback-only.

/act error contract

POST /act uses a structured error response for route-level validation and policy failures:

json
{ "error": "<message>", "code": "ACT_*" }

Current code values:

  • ACT_KIND_REQUIRED (HTTP 400): kind is missing or unrecognized.
  • ACT_INVALID_REQUEST (HTTP 400): action payload failed normalization or validation.
  • ACT_SELECTOR_UNSUPPORTED (HTTP 400): selector was used with an unsupported action kind.
  • ACT_EVALUATE_DISABLED (HTTP 403): evaluate (or wait --fn) is disabled by config.
  • ACT_TARGET_ID_MISMATCH (HTTP 403): top-level or batched targetId conflicts with request target.
  • ACT_EXISTING_SESSION_UNSUPPORTED (HTTP 501): action is not supported for existing-session profiles.

Other runtime failures may still return { "error": "<message>" } without a code field.

Playwright requirement

Some features (navigate/act/AI snapshot/role snapshot, element screenshots, PDF) require Playwright. If Playwright isn't installed, those endpoints return a clear 501 error.

What still works without Playwright:

  • ARIA snapshots
  • Role-style accessibility snapshots (--interactive, --compact, --depth, --efficient) when a per-tab CDP WebSocket is available. This is a fallback for inspection and ref discovery; Playwright remains the primary action engine.
  • Page screenshots for the managed OmeniaClaw browser when a per-tab CDP WebSocket is available
  • Page screenshots for existing-session / Chrome MCP profiles
  • existing-session ref-based screenshots (--ref) from snapshot output

What still needs Playwright:

  • navigate
  • act
  • AI snapshots that depend on Playwright's native AI snapshot format
  • CSS-selector element screenshots (--element)
  • full browser PDF export

Element screenshots also reject --full-page; the route returns fullPage is not supported for element screenshots.

If you see Playwright is not available in this gateway build, the packaged Gateway is missing the core browser runtime dependency. Reinstall or update OmeniaClaw, then restart the gateway. For Docker, also install the Chromium browser binaries as shown below.

Docker Playwright install

If your Gateway runs in Docker, avoid npx playwright (npm override conflicts). For custom images, bake Chromium into the image:

bash
OmeniaClaw_INSTALL_BROWSER=1 ./scripts/docker/setup.sh

For an existing image, install through the bundled CLI instead:

bash
docker compose run --rm OmeniaClaw-cli \  node /app/node_modules/playwright-core/cli.js install chromium

To persist browser downloads, set PLAYWRIGHT_BROWSERS_PATH (for example, /home/node/.cache/ms-playwright) and make sure /home/node is persisted via OmeniaClaw_HOME_VOLUME or a bind mount. OmeniaClaw auto-detects the persisted Chromium on Linux. See Docker.

How it works (internal)

A small loopback control server accepts HTTP requests and connects to Chromium-based browsers via CDP. Advanced actions (click/type/snapshot/PDF) go through Playwright on top of CDP; when Playwright is missing, only non-Playwright operations are available. The agent sees one stable interface while local/remote browsers and profiles swap freely underneath.

CLI quick reference

All commands accept --browser-profile <name> to target a specific profile, and --json for machine-readable output.

Basics: status, tabs, open/focus/close
bash
OmeniaClaw browser statusOmeniaClaw browser startOmeniaClaw browser start --headless # one-shot local managed headless launchOmeniaClaw browser stop            # also clears emulation on attach-only/remote CDPOmeniaClaw browser tabsOmeniaClaw browser tab             # shortcut for current tabOmeniaClaw browser tab newOmeniaClaw browser tab select 2OmeniaClaw browser tab close 2OmeniaClaw browser open https://example.comOmeniaClaw browser focus abcd1234OmeniaClaw browser close abcd1234
Inspection: screenshot, snapshot, console, errors, requests
bash
OmeniaClaw browser screenshotOmeniaClaw browser screenshot --full-pageOmeniaClaw browser screenshot --ref 12        # or --ref e12OmeniaClaw browser screenshot --labelsOmeniaClaw browser snapshotOmeniaClaw browser snapshot --format aria --limit 200OmeniaClaw browser snapshot --interactive --compact --depth 6OmeniaClaw browser snapshot --efficientOmeniaClaw browser snapshot --labelsOmeniaClaw browser snapshot --urlsOmeniaClaw browser snapshot --selector "#main" --interactiveOmeniaClaw browser snapshot --frame "iframe#main" --interactiveOmeniaClaw browser console --level errorOmeniaClaw browser errors --clearOmeniaClaw browser requests --filter api --clearOmeniaClaw browser pdfOmeniaClaw browser responsebody "**/api" --max-chars 5000
Actions: navigate, click, type, drag, wait, evaluate
bash
OmeniaClaw browser navigate https://example.comOmeniaClaw browser resize 1280 720OmeniaClaw browser click 12 --double           # or e12 for role refsOmeniaClaw browser click-coords 120 340        # viewport coordinatesOmeniaClaw browser type 23 "hello" --submitOmeniaClaw browser press EnterOmeniaClaw browser hover 44OmeniaClaw browser scrollintoview e12OmeniaClaw browser drag 10 11OmeniaClaw browser select 9 OptionA OptionBOmeniaClaw browser download e12 report.pdfOmeniaClaw browser waitfordownload report.pdfOmeniaClaw browser upload /tmp/OmeniaClaw/uploads/file.pdfOmeniaClaw browser upload media://inbound/file.pdfOmeniaClaw browser fill --fields '[{"ref":"1","type":"text","value":"Ada"}]'OmeniaClaw browser dialog --acceptOmeniaClaw browser dialog --dismiss --dialog-id d1OmeniaClaw browser wait --text "Done"OmeniaClaw browser wait "#main" --url "**/dash" --load networkidle --fn "window.ready===true"OmeniaClaw browser evaluate --fn '(el) => el.textContent' --ref 7OmeniaClaw browser evaluate --fn 'const title = document.title; return title;'OmeniaClaw browser evaluate --timeout-ms 30000 --fn 'async () => { await window.ready; return true; }'OmeniaClaw browser highlight e12OmeniaClaw browser trace startOmeniaClaw browser trace stop
State: cookies, storage, offline, headers, geo, device
bash
OmeniaClaw browser cookiesOmeniaClaw browser cookies set session abc123 --url "https://example.com"OmeniaClaw browser cookies clearOmeniaClaw browser storage local getOmeniaClaw browser storage local set theme darkOmeniaClaw browser storage session clearOmeniaClaw browser set offline onOmeniaClaw browser set headers --headers-json '{"X-Debug":"1"}'OmeniaClaw browser set credentials user pass            # --clear to removeOmeniaClaw browser set geo 37.7749 -122.4194 --origin "https://example.com"OmeniaClaw browser set media darkOmeniaClaw browser set timezone America/New_YorkOmeniaClaw browser set locale en-USOmeniaClaw browser set device "iPhone 14"

Notes:

  • upload and dialog are arming calls; run them before the click/press that triggers the chooser/dialog. If an action opens a modal, the action response includes blockedByDialog and browserState.dialogs.pending; pass that dialogId to respond directly. Dialogs handled outside OmeniaClaw appear under browserState.dialogs.recent.
  • click/type/etc require a ref from snapshot (numeric 12, role ref e12, or actionable ARIA ref ax12). CSS selectors are intentionally not supported for actions. Use click-coords when the visible viewport position is the only reliable target.
  • Download and trace paths are constrained to OmeniaClaw temp roots: /tmp/OmeniaClaw{,/downloads} (fallback: ${os.tmpdir()}/OmeniaClaw/...).
  • upload accepts files from the OmeniaClaw temp uploads root and OmeniaClaw-managed inbound media. Managed inbound media can be referenced as media://inbound/<id>, sandbox-relative media/inbound/<id>, or a resolved path inside the managed inbound media directory. Nested media refs, traversal, symlinks, hardlinks, and arbitrary local paths are still rejected.
  • upload can also set file inputs directly via --input-ref or --element.

Stable tab ids and labels survive Chromium raw-target replacement when OmeniaClaw can prove the replacement tab, such as same URL or a single old tab becoming a single new tab after form submission. Raw target ids are still volatile; prefer suggestedTargetId from tabs in scripts.

Snapshot flags at a glance:

  • --format ai (default with Playwright): AI snapshot with numeric refs (aria-ref="<n>").
  • --format aria: accessibility tree with axN refs. When Playwright is available, OmeniaClaw binds refs with backend DOM ids to the live page so follow-up actions can use them; otherwise treat the output as inspection-only.
  • --efficient (or --mode efficient): compact role snapshot preset. Set browser.snapshotDefaults.mode: "efficient" to make this the default (see Gateway configuration).
  • --interactive, --compact, --depth, --selector force a role snapshot with ref=e12 refs. --frame "<iframe>" scopes role snapshots to an iframe.
  • With Playwright, --labels adds a screenshot with overlayed ref labels (prints MEDIA:<path>) plus an annotations array with each ref's bounding box. On screenshot, Playwright-backed labels work with --full-page, --ref, and --element; on snapshot, the accompanying screenshot remains viewport-only. Existing-session/chrome-mcp profiles render overlay labels on page screenshots but do not return annotations or use the Playwright full-page/ref/element projection helper. Without Playwright or chrome-mcp, labeled screenshots are not available.
  • --urls appends discovered link destinations to AI snapshots.

Snapshots and refs

OmeniaClaw supports two "snapshot" styles:

  • AI snapshot (numeric refs): OmeniaClaw browser snapshot (default; --format ai)

    • Output: a text snapshot that includes numeric refs.
    • Actions: OmeniaClaw browser click 12, OmeniaClaw browser type 23 "hello".
    • Internally, the ref is resolved via Playwright's aria-ref.
  • Role snapshot (role refs like e12): OmeniaClaw browser snapshot --interactive (or --compact, --depth, --selector, --frame)

    • Output: a role-based list/tree with [ref=e12] (and optional [nth=1]).
    • Actions: OmeniaClaw browser click e12, OmeniaClaw browser highlight e12.
    • Internally, the ref is resolved via getByRole(...) (plus nth() for duplicates).
    • Add --labels to include a screenshot with overlayed e12 labels. On Playwright-backed profiles this also returns per-ref bounding-box metadata (annotations[]).
    • Add --urls when link text is ambiguous and the agent needs concrete navigation targets.
  • ARIA snapshot (ARIA refs like ax12): OmeniaClaw browser snapshot --format aria

    • Output: the accessibility tree as structured nodes.
    • Actions: OmeniaClaw browser click ax12 works when the snapshot path can bind the ref through Playwright and Chrome backend DOM ids.
  • If Playwright is unavailable, ARIA snapshots can still be useful for inspection, but refs may not be actionable. Re-snapshot with --format ai or --interactive when you need action refs.

  • Docker proof for the raw-CDP fallback path: pnpm test:docker:browser-cdp-snapshot starts Chromium with CDP, runs browser doctor --deep, and verifies role snapshots include link URLs, cursor-promoted clickables, and iframe metadata.

Ref behavior:

  • Refs are not stable across navigations; if something fails, re-run snapshot and use a fresh ref.
  • /act returns the current raw targetId after action-triggered replacement when it can prove the replacement tab. Keep using stable tab ids/labels for follow-up commands.
  • If the role snapshot was taken with --frame, role refs are scoped to that iframe until the next role snapshot.
  • Unknown or stale axN refs fail fast instead of falling through to Playwright's aria-ref selector. Run a fresh snapshot on the same tab when that happens.

Wait power-ups

You can wait on more than just time/text:

  • Wait for URL (globs supported by Playwright):
    • OmeniaClaw browser wait --url "**/dash"
  • Wait for load state:
    • OmeniaClaw browser wait --load networkidle
    • Supported on managed OmeniaClaw and raw/remote CDP profiles. The user and existing-session profiles reject networkidle; use --url, --text, a selector, or --fn waits there.
  • Wait for a JS predicate:
    • OmeniaClaw browser wait --fn "window.ready===true"
  • Wait for a selector to become visible:
    • OmeniaClaw browser wait "#main"

These can be combined:

bash
OmeniaClaw browser wait "#main" \  --url "**/dash" \  --load networkidle \  --fn "window.ready===true" \  --timeout-ms 15000

Debug workflows

When an action fails (e.g. "not visible", "strict mode violation", "covered"):

  1. OmeniaClaw browser snapshot --interactive
  2. Use click <ref> / type <ref> (prefer role refs in interactive mode)
  3. If it still fails: OmeniaClaw browser highlight <ref> to see what Playwright is targeting
  4. If the page behaves oddly:
    • OmeniaClaw browser errors --clear
    • OmeniaClaw browser requests --filter api --clear
  5. For deep debugging: record a trace:
    • OmeniaClaw browser trace start
    • reproduce the issue
    • OmeniaClaw browser trace stop (prints TRACE:<path>)

JSON output

--json is for scripting and structured tooling.

Examples:

bash
OmeniaClaw browser status --jsonOmeniaClaw browser snapshot --interactive --jsonOmeniaClaw browser requests --filter api --jsonOmeniaClaw browser cookies --json

Role snapshots in JSON include refs plus a small stats block (lines/chars/refs/interactive) so tools can reason about payload size and density.

State and environment knobs

These are useful for "make the site behave like X" workflows:

  • Cookies: cookies, cookies set, cookies clear
  • Storage: storage local|session get|set|clear
  • Offline: set offline on|off
  • Headers: set headers --headers-json '{"X-Debug":"1"}' (legacy set headers --json '{"X-Debug":"1"}' remains supported)
  • HTTP basic auth: set credentials user pass (or --clear)
  • Geolocation: set geo <lat> <lon> --origin "https://example.com" (or --clear)
  • Media: set media dark|light|no-preference|none
  • Timezone / locale: set timezone ..., set locale ...
  • Device / viewport:
    • set device "iPhone 14" (Playwright device presets)
    • set viewport 1280 720

Security and privacy

  • The OmeniaClaw browser profile may contain logged-in sessions; treat it as sensitive.
  • browser act kind=evaluate / OmeniaClaw browser evaluate and wait --fn execute arbitrary JavaScript in the page context. Prompt injection can steer this. Disable it with browser.evaluateEnabled=false if you do not need it.
  • OmeniaClaw browser evaluate --fn accepts a function source, an expression, or a statement body. Statement bodies are wrapped as async functions, so use return for the value you want back. Use --timeout-ms <ms> when the page-side function may need longer than the default evaluate timeout.
  • For logins and anti-bot notes (X/Twitter, etc.), see Browser login + X/Twitter posting.
  • Keep the Gateway/node host private (loopback or tailnet-only).
  • Remote CDP endpoints are powerful; tunnel and protect them.

Strict-mode example (block private/internal destinations by default):

json5
{  browser: {    ssrfPolicy: {      dangerouslyAllowPrivateNetwork: false,      hostnameAllowlist: ["*.example.com", "example.com"],      allowedHostnames: ["localhost"], // optional exact allow    },  },}
Was this useful?
On this page

On this page