Pinchtab: Browser Control via HTTP for AI Agents

Pinchtab is a 12MB Go binary that gives any AI agent browser control over a plain HTTP API using accessibility trees. Zero config, framework-agnostic, and far cheaper than screenshots.

Pinchtab: Browser Control via HTTP for AI Agents

If you’ve tried giving an AI agent browser access, you already know the problem. Playwright MCP ties you to Node. Browser Use needs Python. OpenClaw’s browser backend only works inside its own ecosystem. Switch agents, or try to fire off a quick curl request to inspect a page, and you’re rewriting the integration from scratch.

Pinchtab takes a different approach: it’s just an HTTP server. A 12MB Go binary with no Node, no Python, no dependencies. It launches its own Chrome and exposes everything—navigation, clicks, form fills, screenshots, accessibility snapshots—through a plain REST API. Whatever agent you’re using speaks HTTP, and that’s the whole integration story.

Why accessibility trees over screenshots

Most people reach for screenshots first because it’s obvious: take a picture, send it to a vision model, done. The problem is cost. A 10-step task using screenshots runs about $0.06. The same task with accessibility trees costs around $0.015.

The real difference shows up at scale. Run that same task 1,000 times and screenshots cost $60; accessibility trees cost $15. Run a 50-page monitoring job:

Method~TokensEst. cost
Screenshots (vision)~100,000$0.30
Full a11y snapshot~525,000$0.16
Pinchtab ?filter=interactive~180,000$0.05
Pinchtab /text~40,000$0.01

Pinchtab’s /text endpoint pulls readable content at around 800 tokens per page using Mozilla’s Readability library (the same thing behind Firefox Reader View). That’s 5x cheaper than a full accessibility snapshot and 13x cheaper than screenshots. For read-heavy work, the difference compounds quickly.

There’s also a reliability argument. Vision models guess coordinates from pixels. Accessibility trees give you stable node refs (e0, e1, e2…) tied to the actual DOM elements. Click e5 and you hit the right button, regardless of how the page renders.

The full API

Pinchtab covers more than basic navigation and clicking:

MethodEndpointDescription
GET/healthCheck if the server and Chrome are responsive
GET/tabsList all open tabs with their IDs
GET/snapshotAccessibility tree as JSON or text
GET/screenshotJPEG screenshot with quality control
GET/textReadable page text (Readability or raw innerText)
POST/navigateGo to a URL in a tab
POST/actionClick, type, fill, press, focus, hover, select, scroll
POST/evaluateRun arbitrary JavaScript
POST/tabOpen or close tabs
POST/tab/lockLock a tab for exclusive agent access
POST/tab/unlockRelease a tab lock
POST/cookiesInject session cookies programmatically

The /tab/lock endpoint is worth noting if you run multiple agents at once. One agent locks a tab, does its work, unlocks it. No competing writes to the same browser context.

Snapshot query parameters

The snapshot endpoint has several options that cut token usage significantly:

# Only interactive elements — ~75% fewer nodes
curl "localhost:9867/snapshot?filter=interactive"

# Compact one-line-per-node format — 56-64% fewer tokens than JSON
curl "localhost:9867/snapshot?format=compact"

# Only changes since the last snapshot
curl "localhost:9867/snapshot?diff=true"

# Limit to a specific section of the page
curl "localhost:9867/snapshot?selector=main"

# Cap output at roughly N tokens
curl "localhost:9867/snapshot?maxTokens=2000"

Combining filter=interactive with format=compact gives you the smallest possible payload for action-oriented tasks. Use diff=true for pages that update incrementally—polling a live dashboard, for example—so you’re only sending what actually changed.

Human-like actions

Beyond standard click and type, Pinchtab has humanClick and humanType actions that add realistic delays and movement patterns. Useful when you’re hitting sites with behavioral bot detection that watches interaction timing.

curl -X POST localhost:9867/action \
  -d '{"kind":"humanType","ref":"e12","text":"hello world"}'

Configuration

All configuration comes through environment variables:

VariableDefaultDescription
BRIDGE_PORT9867HTTP port
BRIDGE_TOKEN(none)Bearer token for auth
BRIDGE_HEADLESSfalseRun Chrome without a window
BRIDGE_STEALTHlightlight (webdriver patch) or full (canvas/WebGL/font spoofing)
BRIDGE_PROFILE~/.pinchtab/chrome-profileChrome profile directory
BRIDGE_STATE_DIR~/.pinchtabState and session storage
BRIDGE_BLOCK_IMAGESfalseSkip image downloads
BRIDGE_BLOCK_MEDIAfalseBlock images, fonts, CSS, video
BRIDGE_NO_ANIMATIONSfalseFreeze CSS animations globally
BRIDGE_TIMEOUT15Action timeout in seconds
BRIDGE_NAV_TIMEOUT30Navigation timeout in seconds
BRIDGE_TIMEZONE(system)Chrome timezone (e.g. America/New_York)
CDP_URL(none)Connect to an existing Chrome instead of launching one
CHROME_BINARY(auto)Path to Chrome or Chromium
CHROME_FLAGS(none)Extra Chrome launch flags

BRIDGE_BLOCK_MEDIA is the aggressive version—it skips everything except HTML and JavaScript. Useful for bulk scraping where page fidelity doesn’t matter. BRIDGE_NO_ANIMATIONS helps if you’re snapshotting pages mid-animation and getting inconsistent results.

You can also generate a config file if you prefer JSON over environment variables:

pinchtab config init   # creates ~/.pinchtab/config.json
pinchtab config show   # shows current effective config

Environment variables override the config file, so it works fine alongside Docker secrets or .env files.

Docker deployment

The simplest way to run Pinchtab on a server. Chrome needs seccomp=unconfined in a container, which is the main reason you’d want to isolate it from the rest of your stack.

Basic docker-compose setup

# docker-compose.yml
services:
  pinchtab:
    image: pinchtab/pinchtab:latest
    container_name: pinchtab
    restart: unless-stopped
    security_opt:
      - seccomp:unconfined
    mem_limit: 2g
    cpus: "2.0"
    environment:
      - BRIDGE_PORT=9867
      - BRIDGE_HEADLESS=true
      - BRIDGE_STEALTH=full
      - BRIDGE_TOKEN=${PINCHTAB_TOKEN}
      - BRIDGE_BLOCK_IMAGES=true
      - BRIDGE_NO_ANIMATIONS=true
    volumes:
      - pinchtab-data:/data
    ports:
      - "127.0.0.1:9867:9867"

volumes:
  pinchtab-data:

A few things to note here. The port binding 127.0.0.1:9867:9867 only exposes the service on localhost, not to the network. The memory limit matters: Chrome with a few tabs open will easily use 1-1.5GB. Setting BRIDGE_BLOCK_IMAGES=true helps keep memory usage lower if you’re doing content-only tasks.

Behind a Caddy reverse proxy

If you want to expose Pinchtab over HTTPS with a domain:

# docker-compose.yml
services:
  pinchtab:
    image: pinchtab/pinchtab:latest
    container_name: pinchtab
    restart: unless-stopped
    security_opt:
      - seccomp:unconfined
    mem_limit: 2g
    cpus: "2.0"
    environment:
      - BRIDGE_PORT=9867
      - BRIDGE_HEADLESS=true
      - BRIDGE_STEALTH=full
      - BRIDGE_TOKEN=${PINCHTAB_TOKEN}
      - BRIDGE_BLOCK_IMAGES=true
    volumes:
      - pinchtab-data:/data
    networks:
      - proxy

  caddy:
    image: caddy:2-alpine
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy-data:/data
      - caddy-config:/config
    networks:
      - proxy

networks:
  proxy:
    driver: bridge

volumes:
  pinchtab-data:
  caddy-data:
  caddy-config:
# Caddyfile
pinchtab.yourdomain.com {
    reverse_proxy pinchtab:9867
}

Caddy handles TLS automatically. With BRIDGE_TOKEN set, requests need an Authorization: Bearer <token> header, so the API isn’t open to the public even with HTTPS.

curl -H "Authorization: Bearer $PINCHTAB_TOKEN" \
  https://pinchtab.yourdomain.com/health

Build from source vs prebuilt image

The official docker-compose.yml in the repo uses build: . which compiles from source. If you want the prebuilt image, use image: pinchtab/pinchtab:latest instead. Check the releases page for available tags.

Security concerns and mitigations

Pinchtab gives an AI agent full control of a real Chrome browser, including any accounts you’ve logged into through that browser. The README is direct about this: “Think of Pinchtab like giving someone your unlocked laptop.” Here’s what that means in practice and how to handle it.

No auth by default

Out of the box, Pinchtab accepts requests from anyone who can reach port 9867. On a shared network or a server with a public IP, that means anyone.

Mitigation: Always set BRIDGE_TOKEN. Once set, every request needs Authorization: Bearer <token> or it gets a 401. Treat this token like a password—generate something long and random, store it in an environment variable or secret manager, never hardcode it.

# Generate a token
openssl rand -hex 32

Pinchtab binds to all interfaces

By default, the server listens on 0.0.0.0, not just localhost. On a cloud server this means port 9867 is reachable from anywhere if your firewall allows it.

Mitigation: Either bind the port to localhost only (as shown in the docker-compose above with 127.0.0.1:9867:9867), or set a firewall rule that blocks external access to port 9867. Using a reverse proxy like Caddy or Nginx adds another layer and lets you handle TLS properly.

The Chrome profile holds live sessions

When you log into a site through Pinchtab’s Chrome window, that session persists in ~/.pinchtab/chrome-profile/. Cookies, saved passwords, auth tokens. An agent with API access can use those sessions to act as you on any site you’re logged into.

Mitigation:

  • Use a dedicated Chrome profile with only the accounts your agents actually need
  • Don’t log personal accounts (email, banking, social) into the Pinchtab profile unless you specifically need them automated
  • Treat ~/.pinchtab/ as sensitive and restrict file permissions: chmod 700 ~/.pinchtab
  • In Docker, use a named volume and avoid mounting it read-only (Pinchtab needs to write state), but don’t mount it somewhere accessible to other containers

The seccomp=unconfined requirement

Chrome needs a relaxed seccomp profile to run in a container. This is a real privilege: it removes a layer of kernel syscall filtering.

Mitigation: This is harder to fully mitigate without building a custom seccomp profile for Chromium. The practical approach is to isolate the Pinchtab container—don’t run it on the same network as containers with database access or other sensitive services. Keep it in its own network segment that only your agent service can reach.

Agent trust model

An agent with Pinchtab access can do anything a human with that browser can do. If your agent takes arbitrary instructions from users or external inputs, prompt injection is a real concern: a malicious website could include instructions in its content that trick the agent into taking unintended actions.

Mitigation:

  • Scope what your agent is allowed to do. If it only needs to read pages, don’t give it action capabilities
  • Log all /action and /navigate calls so you can audit what happened
  • Consider running agents against a sandboxed profile with no real accounts for untrusted inputs
  • Rate-limit your Pinchtab endpoint—add rate limiting in Caddy or Nginx if you’re exposing it to external agents

Don't skip the token

Running Pinchtab without BRIDGE_TOKEN on any internet-connected server is a serious risk. Anyone who finds the port has full browser control. Set the token before anything else.

A typical agent workflow

import httpx

BASE = "http://localhost:9867"
HEADERS = {"Authorization": "Bearer your-token"}

# Navigate to a page
httpx.post(f"{BASE}/navigate", json={"url": "https://example.com"}, headers=HEADERS)

# Get only interactive elements to keep tokens low
snapshot = httpx.get(f"{BASE}/snapshot?filter=interactive&format=compact", headers=HEADERS)
refs = snapshot.json()

# Click a button by its ref
httpx.post(f"{BASE}/action", json={"kind": "click", "ref": "e5"}, headers=HEADERS)

# Read the result
text = httpx.get(f"{BASE}/text", headers=HEADERS)
print(text.text)

Node refs are stable within a snapshot. After a click that loads a new page, take a fresh snapshot—refs on the new page are independent. For pages that update without a full load (React, Vue SPAs), use ?diff=true to see what changed rather than re-fetching the whole tree.

For multi-agent setups where several agents share a browser instance, use tab locking:

# Lock tab 1 for exclusive use
curl -X POST localhost:9867/tab/lock -d '{"tabId": 1}'

# ... do your work ...

# Release it
curl -X POST localhost:9867/tab/unlock -d '{"tabId": 1}'

When it fits well

Pinchtab is a good fit for agents that need to browse the web but don’t need to be tightly coupled to a browser testing framework. Specifically:

  • Web scraping and content monitoring at any scale
  • Form automation (sign-ups, data entry, multi-step workflows)
  • Authenticated scraping after a one-time login setup
  • Agents running in bash scripts, Go programs, or any language with HTTP support
  • Setups where you want to switch between different agent frameworks without changing browser integration

It’s not the right tool for pixel-accurate visual testing (Playwright is better there), or for sites where the accessibility tree is sparse or unreliable. Some single-page apps built with custom components don’t expose much useful ARIA data, and a full screenshot becomes the more practical option.

Getting started

# Docker (easiest, no Chrome install needed)
docker run -d \
  -p 127.0.0.1:9867:9867 \
  --security-opt seccomp=unconfined \
  -e BRIDGE_TOKEN=your-secret-token \
  -e BRIDGE_HEADLESS=true \
  pinchtab/pinchtab:latest

curl -H "Authorization: Bearer your-secret-token" http://localhost:9867/health

# Build from source (requires Go 1.25+ and Chrome installed)
git clone https://github.com/pinchtab/pinchtab.git
cd pinchtab
go build -o pinchtab .
BRIDGE_HEADLESS=true BRIDGE_TOKEN=your-secret-token ./pinchtab

The GitHub repo has an OpenClaw skill that can install and configure Pinchtab automatically if you’re using an agent that supports skills.


Pinchtab is MIT licensed. Source on GitHub.