Browser Automation
PocketPaw includes a browser automation tool powered by Playwright. Unlike screenshot-based approaches, it uses accessibility tree snapshots for reliable, fast interaction.
How It Works
The BrowserDriver provides:
- Navigation — Open URLs and navigate between pages
- Accessibility tree — Get a structured representation of the page
- Ref map — Maps reference numbers to CSS selectors for clicking/typing
- Interaction — Click elements, type text, scroll, using ref numbers
Why Accessibility Tree?
Instead of taking screenshots and using vision models, PocketPaw reads the page’s accessibility tree. This is:
- Faster — No image processing or vision API calls
- More reliable — Structured data instead of visual interpretation
- Cheaper — No vision model tokens
- Accessible — Works with any page, including dynamic SPAs
Usage
User: Go to news.ycombinator.com and find the top story
Agent: [uses browser tool] → navigate to "https://news.ycombinator.com" → accessibility tree shows: [1] "Story Title" link [2] "comments" link ... → click ref 1 → read page contentConfiguration
The browser tool uses Playwright. Install it with:
curl -fsSL https://pocketpaw.xyz/install.sh | sh
# Or add the browser extra manuallypip install pocketpaw[browser]playwright install chromiumNavigationResult
Each browser action returns a NavigationResult:
@dataclassclass NavigationResult: url: str # Current URL title: str # Page title accessibility_tree: str # Structured page content refmap: dict[int, str] # Ref number → CSS selector mappingPolicy Group
The browser tool belongs to group:browser. It’s included in the coding profile.
Installation
curl -fsSL https://pocketpaw.xyz/install.sh | sh
# Or add the browser extra manuallypip install pocketpaw[browser]This installs playwright as an optional dependency. You also need to install browser binaries:
playwright install chromiumWas this page helpful?