How I gave my OpenClaw agent a browser on a headless VPS with Browserless and CapRover

I wanted one of my OpenClaw agents to have a real browser.

Not web_fetch. Not a fake browser abstraction. A real browser that could open pages, click things, fill forms, inspect JS-heavy sites, and work like a proper web operator.

On paper, this sounds simple.

In practice, the agent was running inside a VPS-hosted container. That changes the problem.

A lot of browser automation advice quietly assumes a desktop machine with a local Chrome install and a usable GUI environment. That was not the setup here.

The working answer ended up being:

stop trying to make the agent launch a local desktop browser,
run a dedicated Browserless service on CapRover,
attach OpenClaw to it over remote CDP,
and treat the browser as infrastructure instead of a local app.

That approach worked.

The original mistake: thinking like a laptop

The first instinct was the obvious one:

enable OpenClaw’s built-in browser support,
install Chromium,
run it headless,
point OpenClaw at the binary,
and expect everything to come up.

That got surprisingly far.

We fixed the first layer of issues:

browser plugin enabled,
Chromium installed,
headless mode configured,
browser endpoint reachable.

But the important part still did not work reliably.

The gateway moved from one failure mode to another:

first, the browser method was not loaded,
later, the browser runtime was loaded but still could not cleanly start a supported local browser.

That was the clue.

The problem was not “one more launch flag.” The problem was the mental model.

A VPS-hosted agent does not need a pretend desktop. It needs a browser service.

Why local browser launch was the wrong abstraction

There are cases where local browser launch is perfect:

developer workstation,
logged-in personal machine,
one-off manual debugging,
existing browser session reuse.

A headless agent container on a server is not one of those cases.

In that environment, local launch creates the wrong kind of complexity:

browser binary detection problems,
sandbox/container quirks,
display assumptions,
lifecycle mismatch between the agent and the browser process,
harder debugging when the browser is treated as an implementation detail inside the same runtime.

None of those are impossible.

They are just the wrong place to spend operational energy if what you actually want is reliable browser capability for an agent.

The better model: browser as infrastructure

Once I reframed the problem, the right shape became obvious.

Instead of asking OpenClaw to spawn and own a local browser, I gave it a remote browser endpoint to attach to.

That means three layers:

1) Browserless runs the actual browser

Browserless becomes the browser runtime boundary.

It owns Chromium and exposes a CDP/WebSocket endpoint designed for automation.

2) OpenClaw stays the control plane

OpenClaw still does the useful agent-facing work:

snapshots,
clicks,
form interactions,
navigation,
profile routing,
browser tool integration.

So I did not replace OpenClaw’s browser tool. I gave it a better backend.

3) CapRover manages Browserless like any other service

That matters because it keeps the deployment boring.

Browser infrastructure should be deployed, restarted, observed, and upgraded like a normal service. Not smuggled in as a side effect of one app container trying to pretend it is a workstation.

Why Browserless was the right fit here

I looked at a few options.

Raw Chrome with remote debugging

This can work.

But it leaves you doing more of the rough work yourself:

launch args,
process management,
lifecycle,
auth,
queueing,
shared browser infra concerns.

Good for experiments. Not my first choice for durable agent infrastructure.

Playwright server

Also workable, but it was not the best fit for this OpenClaw setup.

The main reason is protocol fit. OpenClaw’s browser integration is happiest attaching to a remote CDP-style browser target. Browserless lines up naturally with that.

Playwright server is more attractive when Playwright-native remote execution is the center of the system. That was not the goal here.

Browserless

Browserless was the cleanest fit because it is already shaped like a browser service.

That gave me:

a self-hosted Chromium service,
remote CDP attach,
clean separation from the agent runtime,
a container-first deployment model,
and a path OpenClaw already understands.

That is exactly what I wanted.

The deployment shape that worked

I deployed Browserless as its own CapRover app.

Not public-first. Not mixed into the OpenClaw container. Not as a sidecar hack.

A separate app.

That let me keep the rollout disciplined:

create the Browserless service,
keep it internal,
verify the endpoint works,
switch OpenClaw to use it as the default browser profile,
test real browser actions.

That order matters.

When you separate infrastructure bring-up from client migration, debugging gets much easier.

If Browserless fails, that is one problem. If OpenClaw fails to attach, that is a different problem. If page automation fails after attach, that is a third problem.

Those are better questions than “something about browser automation is broken.”

The OpenClaw config shape that mattered

The key shift in OpenClaw was to stop using the default local openclaw browser profile and instead point the browser stack at a remote Browserless profile.

The practical behavior I wanted was:

browser enabled,
default profile set to browserless,
remote CDP target configured,
attachOnly: true,
no dependency on local executable detection.

That attachOnly bit is important.

Without it, you are back in the ambiguous territory where the agent may still treat the endpoint like something it should launch or own locally.

With it, the contract becomes cleaner:

Browserless owns the browser,
OpenClaw attaches and controls it.

That is the right separation of responsibility.

The gotcha that wasted the most time

The biggest practical gotcha was not Browserless. It was the difference between config file state and live process state.

The OpenClaw config could be correct on disk while the running process still exposed the old browser profile.

That created misleading intermediate states where:

the file said browserless,
but the live browser status still showed the local openclaw profile.

If you hit this kind of problem, do not stop at “the config file looks right.”

Verify the live runtime.

Specifically:

check browser status,
check doctor output,
verify which profile is actually active,
then perform a real browser action.

Config that is only true on disk is not finished work.

The validation that mattered

I did not want a fake green check.

So the final test standard was simple:

browser doctor passes,
live profile is browserless,
a page can actually open,
and a snapshot can be captured from that page.

That last part matters.

It is easy to declare success when a service responds to health checks. It is more useful to prove the exact capability you wanted in the first place.

In this case, the real test was not “is Browserless up?” It was “can the OpenClaw agent actually browse?”

Once the agent opened https://example.com and returned a valid snapshot through the Browserless-backed profile, the answer was yes.

If your agent runs on a remote Linux host or in a containerized platform like CapRover, I would default to this decision tree.

Use a remote browser service when:

the agent lives on a VPS,
there is no real desktop session,
you care about reliability more than clever local hacks,
and browser automation is becoming part of the system, not just a one-off test.

Use local browser control when:

the agent runs on your laptop or workstation,
you want to reuse your logged-in browser,
or you explicitly need a human-driven session boundary.

The mistake is trying to make those two worlds share the same default setup.

They are different environments. They should have different browser strategies.

The operator takeaway

The most useful lesson here was not a Browserless trick. It was this:

if your agent is running on a headless VPS, stop treating browser automation like a desktop app problem. Treat it like infrastructure.

Once I did that, the architecture got simpler:

Browserless handled the browser runtime,
CapRover handled the service lifecycle,
OpenClaw handled the agent-facing browser control,
and validation became straightforward.

That is the kind of shape I trust more over time.

Not because it is more clever. Because it is less confused.

If you self-host agents seriously, that distinction matters a lot.