Infrastructure & Planned Changes

Created: 2026-05-16 | Tags: hermes, infrastructure, devops, self-hosting

Current Infrastructure

All services run on a Linux VPS (6.8.0-generic) with root access.

  • Hermes Agent -- Running directly at ~/.hermes/, connected via Telegram gateway
  • Caddy -- Docker container, HTTPS reverse proxy via Let's Encrypt TLS-ALPN-01
  • Wiki (mdbook) -- Docker container, static HTML behind Caddy, built from /var/www/wiki/book/
  • Domain routing -- wiki.hermy.pathcomponent.net with HTTP basic auth

Wiki Hosting Details

  • Domain: wiki.hermy.pathcomponent.net
  • Stack: mdbook -> static HTML -> Caddy
  • Auth: HTTP basic auth (user: cloud, password in Caddyfile)
  • Build: mdbook build /var/www/wiki/book (no restart needed)
  • Ports: 80 (HTTP -> HTTPS redirect) + 443
  • Configs: /root/docker/compose.yml, /root/docker/Caddyfile
  • Content: /var/www/wiki/book/src/ with SUMMARY.md

Data & State

  • Databases: SQLite only (no Postgres, no MySQL)
  • Session store: SQLite at ~/.hermes/sessions/
  • Infrastructure state: ad-hoc (no IaC yet)
  • Secrets: ~/.hermes/.env for Hermes API keys; wiki basic auth in Caddyfile

Current Security Posture

  • Root access on VPS
  • No container isolation for Hermes
  • Caddy runs in Docker (ports 80/443)
  • SSH: no documented lockdown
  • No secrets management
  • No audit logging

Planned Migration

Full architecture overhaul tracked in ~/.hermes/migration-plan-reference.md.

Target Architecture

  ┌──────────────────────────────────────────────────┐
  │                    VPS Host                       │
  │                                                    │
  │  ┌──────────────────────┐  ┌──────────────────┐  │
  │  │  Docker (rootful)    │  │  Podman (rootless)│  │
  │  │                      │  │                  │  │
  │  │  ┌─────┐ ┌────────┐ │  │  ┌──────────────┐│  │
  │  │  │Caddy│ │ Forgejo│ │  │  │ Hermes Agent ││  │
  │  │  │HTTPS│ │ Git+CI │ │  │  │ (selective   ││  │
  │  │  └─────┘ └────────┘ │  │  │  mounts)     ││  │
  │  │          ┌────────┐ │  │  └──────────────┘│  │
  │  │          │ skills │ │  │                  │  │
  │  │          │  repo  │ │  │                  │  │
  │  │          └────────┘ │  │                  │  │
  │  └──────────────────────┘  └──────────────────┘  │
  │                                                    │
  │  GitOps: Hermes proposes -> git push -> CI applies│
  │  Tofu state: Forgejo Actions artifacts (unencrypted)│
  └──────────────────────────────────────────────────┘

Repos (4, hosted on local Forgejo)

  1. infra -- OpenTofu, Caddy config, compose files, DNS configs
  2. wiki -- mdbook source markdown (versioned)
  3. cron-tasks -- Hermes cron scripts
  4. skills -- Agent-created skills, versioned in git

Key Design Decisions

Container runtime

Docker (rootful) for Caddy + Forgejo (need ports 80/443). Podman (rootless) for Hermes (doesn't need root).

IaC tool

OpenTofu -- Terraform fork, no license concerns, local backend.

State storage

Forgejo Actions artifacts. Not in git tree, restorable per run. No encryption needed -- artifacts are already private to the instance; age would be over-engineering for single-user.

Databases

SQLite only. No Postgres complexity. Forgejo uses SQLite by default.

Caddy TLS

TLS-ALPN-01. No DNS API token needed -- Caddy handles the ACME challenge over port 443 directly.

Hermes secrets

~/.hermes/.env as a read-only bind mount. Simple, no extra infrastructure. Same attack surface as alternatives for a single-user setup.

Forgejo secrets

Admin password set once in app.ini. CI secrets managed through Forgejo's built-in UI, scoped per-repo.

Hermes container isolation

Selective mount isolation, not a monolithic read-only rootfs:

  • Read-only: /usr/, /etc/, ~/.hermes/config.yaml, ~/.hermes/.env
  • Writeable: ~/.hermes/skills/, ~/.hermes/sessions/, ~/.hermes/logs/, ~/.hermes/memory/
  • Volatile (tmpfs): /tmp/ (size-limited), /home/
  • No Docker socket, no sudo, no production access, no host filesystem access outside data dirs

Skills versioning

After skill_manage writes/patches a skill, Hermes pushes to Forgejo/skills. Full history, rollback via git revert, CI validates frontmatter + broken links.

GitOps constraint

Hermes cannot run tofu, access DNS API, or modify production directly. Changes only through git push + Forgejo CI.


7-Phase Migration Plan

Phase 1: Inventory (~3h)

  • Audit current files, configs, service files, cron jobs, secrets
  • Audit DNS records, domain configs, SSL certificates
  • Document all current state

Phase 2: Forgejo + Caddy (~4h)

  • Docker Compose for Caddy + Forgejo
  • Configure Forgejo Actions runner
  • Point git.yourdomain.com to Caddy, get HTTPS live

Phase 3: Infra Repo (~2h)

  • Initialize OpenTofu (local backend)
  • Upload/download artifact workflow for state

Phase 4: Wiki Repo (~2h)

  • Migrate wiki source to git
  • CI deploy workflow: mdbook build -> rsync to serving dir
  • Content now versioned and reviewable

Phase 5: Containerize Hermes (~3h)

  • Podman rootless container with selective mount isolation
  • Read-only: /usr/, /etc/, config.yaml, .env
  • Writeable: skills/, sessions/, logs/, memory/
  • Volatile: /tmp/, /home/ as tmpfs
  • No Docker socket, no sudo, no production access
  • WireGuard tunnel if needed for API egress

Phase 5b: Skills Git Post-Commit Hook

  • After skill_manage, Hermes auto-pushes to Forgejo/skills
  • Git hook or cron syncs ~/.hermes/skills/ -> skills repo
  • Forgejo CI validates: frontmatter lint, broken links, dup slugs
  • Rollback via git revert <sha>

Phase 6: Domain Cutover (~1h)

  • Caddy vhost routing for all subdomains
  • Rename pathcomponent.net -> yourdomain.com
  • Update all internal references

Phase 7: Hardening (~2h)

  • SSH: key-only, disable root, change port, fail2ban
  • UFW: strict rules
  • auditd: file integrity monitoring
  • Regular backup cron
  • RESTORE.md: disaster recovery runbook

Constraints

  • Hermes cannot run tofu or modify production directly. Changes only through git push + Forgejo CI.
  • Config/.env are read-only inside the Hermes container. The agent cannot modify its own constraints.
  • No Postgres. SQLite is the only database engine.
  • No vendor lock-in. Forgejo over GitHub, self-hosted over SaaS.

Open Questions

  • When to execute migration (depends on having a custom domain with DNS API access)
  • Skills repo sync mechanism: git push on every skill_manage call vs periodic cron sync vs inotify watch?
  • Backup strategy details (frequency, retention, off-site)
  • Monitoring and alerting setup post-migration
  • Whether wiki should share the skills repo or be separate (currently planned: separate repos)