I wanted a Hermes Agent running like a normal service.

Not a local terminal experiment. Not a container I had to babysit. Not something that worked once and then broke the next time CapRover recreated the app.

The target shape was simple:

  • Hermes runs on CapRover,
  • all state survives image upgrades,
  • Codex is available inside the container,
  • ChatGPT/Codex auth survives restarts,
  • the dashboard is reachable behind basic auth,
  • and redeploying the image does not require redoing setup by hand.

That shape is achievable, but there are a few gotchas worth avoiding.


The deployment model

Hermes’ Docker model is clean: the image is mostly stateless, and user data lives in /opt/data.

That directory is the important part. It contains things like:

  • config.yaml,
  • .env,
  • auth files,
  • sessions,
  • skills,
  • logs,
  • generated Slack manifests,
  • and any tool state that needs to persist.

So the CapRover app should have persistent storage mounted at:

/opt/data

In my case the host path was:

/captain/data/hermes/data

The first version used the upstream image directly:

nousresearch/hermes-agent:latest

That works for a basic deployment, but I quickly wanted two small additions:

  1. the hermes command should be available directly on PATH,
  2. the Codex CLI should be present after every restart and redeploy.

That meant a thin wrapper image was the right abstraction.


The wrapper image

The wrapper image keeps Hermes itself upstream, while adding the operational conveniences I need.

FROM nousresearch/hermes-agent:latest

USER root

RUN npm install -g @openai/codex \
    && ln -sf /opt/hermes/.venv/bin/hermes /usr/local/bin/hermes \
    && npm cache clean --force

COPY docker/hermes-entrypoint.sh /usr/local/bin/hermes-entrypoint.sh
RUN chmod +x /usr/local/bin/hermes-entrypoint.sh

ENV PATH="/opt/hermes/.venv/bin:/usr/local/bin:${PATH}" \
    HOME="/opt/data/home"

ENTRYPOINT ["/usr/local/bin/hermes-entrypoint.sh"]

The important pieces:

  • @openai/codex is installed at build time, not manually inside a live container.
  • hermes is symlinked into /usr/local/bin.
  • HOME=/opt/data/home, so tool auth like ~/.codex persists on the mounted volume.
  • a wrapper entrypoint handles permission normalization before delegating to Hermes’ official entrypoint.

This lets upgrades stay boring. Rebuild the image from the latest upstream base, redeploy, keep /opt/data untouched.


The entrypoint permission fix

This was the gotcha that actually mattered.

CapRover docker exec sessions often land you in the container as root. If you run setup commands there, files like these can become root-owned:

/opt/data/auth.json
/opt/data/config.yaml

But the official Hermes entrypoint drops privileges before running the gateway. That is good. Running the gateway as root would create more problems later.

The failure mode is subtle:

  • hermes -z works when you test it as root,
  • Slack connects,
  • the gateway starts,
  • but Slack requests fail with provider/auth errors.

The logs tell the truth:

Permission denied: /opt/data/auth.json
No Codex credentials stored
No inference provider configured

The fix is to normalize the persistent volume before the official entrypoint drops privileges:

#!/usr/bin/env bash
set -euo pipefail

if [ "$(id -u)" = "0" ]; then
  mkdir -p /opt/data /opt/data/home

  if getent passwd hermes >/dev/null 2>&1; then
    chown -R hermes:hermes /opt/data || true
  fi

  chmod -R u+rwX,go+rX /opt/data || true
fi

exec /opt/hermes/docker/entrypoint.sh "$@"

That extra chmod is intentionally boring. If ownership changes are restricted by the mount, the gateway can still read config and auth.


The command CapRover should run

Another easy trap: if the container starts the default interactive CLI, it exits immediately because there is no TTY.

The logs look like this:

Warning: Input is not a terminal (fd=0).
Goodbye!

Then CapRover keeps restarting it.

For gateway mode, run:

gateway run

If you are setting a CapRover service override manually, make sure it still goes through the entrypoint. The safe shape is:

/opt/hermes/docker/entrypoint.sh gateway run

With the wrapper image above, the default entrypoint can also receive:

gateway run

The key is: do not accidentally boot the interactive CLI in a non-interactive container.


CapRover environment variables

For the dashboard and API server, I used:

API_SERVER_ENABLED=true
API_SERVER_HOST=0.0.0.0
API_SERVER_KEY=...
API_SERVER_CORS_ORIGINS=https://hermes.example.com

HERMES_DASHBOARD=1
HERMES_DASHBOARD_HOST=0.0.0.0
HERMES_DASHBOARD_PORT=9119
HERMES_HEADLESS=1

The app exposed port 9119 because the dashboard was the public surface.

The dashboard itself was also behind CapRover nginx basic auth. That is separate from Hermes’ own configuration. If the browser prompts for credentials before the dashboard loads, that is nginx basic auth, not Hermes auth.


Configuring Codex

Once the container is running with persistent /opt/data, exec into it and authenticate Hermes with Codex.

hermes auth add codex-oauth

Then choose Codex as the inference provider:

hermes model

Pick OpenAI Codex and the model your subscription exposes.

Then test from inside the container:

hermes -z "Reply with exactly: codex-ok"

If that returns:

codex-ok

Codex works for that shell.

But do not stop there. The gateway runs as the non-root Hermes user, so verify that Slack or gateway-triggered requests can read the same auth/config files. If Slack says “No Codex credentials stored” while hermes -z works in your root shell, you almost certainly have a file ownership problem under /opt/data.


Build and deploy automation

I put the wrapper image in the agent control repo and added a GitHub Actions workflow that:

  1. builds the wrapper image,
  2. pushes it to GHCR,
  3. deploys the immutable SHA-tagged image to CapRover.

The image tags look like:

ghcr.io/gregagi/hermes-agent:latest
ghcr.io/gregagi/hermes-agent:sha-<commit-sha>

Deploying the SHA tag matters. If something breaks, you know exactly what is running.

The workflow needs these repo variables/secrets:

CAPROVER_HERMES_URL
CAPROVER_HERMES_APP_NAME
CAPROVER_HERMES_APP_TOKEN

I also allowed a CapRover password secret as a fallback before a per-app token exists. App tokens are better long-term.

One small workflow gotcha: include every file that affects the image in the path trigger. I initially triggered only on Dockerfile.hermes, then changed the entrypoint script and wondered why the fix had not deployed. The workflow should include both:

paths:
  - Dockerfile.hermes
  - docker/hermes-entrypoint.sh
  - .github/workflows/deploy-hermes-image.yml

How I verified the deployment

The checks that mattered were:

  1. the CapRover app deployed the new image,
  2. the container did not restart-loop,
  3. logs showed gateway mode, not interactive mode,
  4. the dashboard started,
  5. Hermes could answer through Codex,
  6. Slack requests used the same credentials the CLI test used.

Good logs look like this:

Dropping root privileges
Starting hermes dashboard on 0.0.0.0:9119
Hermes Gateway Starting...

Bad logs look like this:

Warning: Input is not a terminal
Goodbye!

or:

Permission denied: /opt/data/auth.json
No Codex credentials stored

Those two failures point to different fixes:

  • interactive CLI failure → fix the container command,
  • auth/config permission failure → fix /opt/data ownership/permissions.

The final shape

The deployment I would repeat is:

  • official Hermes image as the base,
  • tiny wrapper image for Codex and PATH ergonomics,
  • persistent /opt/data,
  • HOME=/opt/data/home,
  • gateway mode as the container command,
  • entrypoint-level volume permission normalization,
  • GitHub Actions rebuild and deploy to CapRover.

That keeps the setup easy to upgrade.

When a new Hermes version lands, rebuild the wrapper image. The state stays in /opt/data. The image stays disposable. The agent keeps working.