The Execution Plan#

What Is It?#

When convergence decides it’s time to deploy your apps , it doesn’t just march through a fixed list of steps. Instead, it builds an execution plan — a map of every piece of work that needs to happen, along with which pieces depend on which others. Then a runner walks that map, doing as much work in parallel as the dependencies allow.

Think of it like cooking a multi-course dinner for twelve people. You could cook one dish at a time from start to finish — start the pasta sauce, finish the pasta sauce, then start the salad, then the roast, then dessert. You’d eat at midnight. What you actually do is look at the whole menu, notice that the roast takes two hours but the salad takes five minutes, and start the roast first. While it’s in the oven, you chop the salad, simmer the sauce, and prep dessert. Some things still have hard orderings (“don’t plate the pasta before the sauce is done”), but everything else overlaps.

The execution plan is the menu. The runner is the cook. The dependency edges are the “don’t do X before Y” rules.

Why Does It Exist?#

The old convergence pipeline had rigid phases: deploy all apps, then sync all aggregators, then run all reconcilers, then update DNS. Three problems:

  1. Slow. A 30-second deploy of your media target had to finish before a 3-second deploy of your monitoring target was even considered — even though nothing on those two targets touches the other. The whole pipeline ran at the speed of its slowest app.
  2. Fragile. A broken exporter in one app would fail its deploy, which blocked the aggregator sync, which blocked every reconciler, which blocked DNS. One typo in one container image could freeze integration for twenty other apps.
  3. Opaque. Because the phases were hardcoded, adding new kinds of work (setup callbacks, per-app reconcilers) meant bolting on new phases and hoping the interactions were right.

The execution plan solves all three. It’s just a set of work-nodes and “wait for this other node” arrows — and every rule about ordering lives in the apps’ own meta.yml files, not in convergence code.

What’s Inside the Plan?#

Each piece of work is a node. Nodes come in six kinds:

KindWhat it doesHow many
DEPLOYDeploys one app to one target — renders its files, starts its container, runs its setup reconcilers , checks healthOne per app
CALLBACKCaptures first-boot secrets that apps generated during setup (like a freshly-created API key) and writes them back to SOPS , then re-renders the convention files that depend on those secretsOne per plan (only exists if at least one app uses setup_callback)
SYNCPushes a single aggregator ’s collected convention files to its target and restarts the service — used by Traefik , Prometheus , Homepage , etc.One per aggregator that supports live reload
REDEPLOYFull re-deploy of an aggregator instead of a sync — used when the aggregator’s config can only be picked up by rebuilding the container (Authelia )One per aggregator that needs this
RECONCILERuns an app’s integration reconcilers — Sonarr registering itself in Prowlarr, Backrest wiring into ntfy, etc.One per app with integration reconcilers
DNSReconciles DNS records through your providers so every app has its subdomain pointing at the right IPOne per plan

Every node knows which app it belongs to (if any), which target it touches (if any), and which other nodes it’s waiting on.

What Makes One Node Wait for Another?#

The dependency edges are the whole point. They come from facts already declared in meta.yml — no code changes when you add new apps:

Signal in meta.ymlEdge it creates
requires: [postgres]This app’s DEPLOY waits for postgres’ DEPLOY
setup_callback: "..."CALLBACK waits for this app’s DEPLOY
aggregator.sync.strategy: dirSYNC node exists for this aggregator; waits for its own DEPLOY + CALLBACK
aggregator.sync.strategy: redeployREDEPLOY node exists; same deps as SYNC
integration_reconcilers: [...]RECONCILE node exists for this app; waits for its DEPLOY + every aggregator SYNC/REDEPLOY (so routing, SSO, and widgets are live before the reconciler runs)
integration_reconcilers[].requires: [X]RECONCILE(this) waits for RECONCILE(X) when X also has reconcilers

DNS has no dependencies at all — it uses IPs from the converge state, which exist regardless of whether any individual deploy in this run succeeded.

How the Runner Walks the Plan#

Once the plan is built, a runner walks it with a simple loop:

  1. Find every node whose dependencies are all satisfied
  2. Dispatch them to a thread pool
  3. When a node finishes, add it to the “done” set and go back to step 1
  4. When a node fails, mark every node transitively downstream as “blocked” and skip them
  5. Stop when every node is either done or blocked

Two scheduling rules keep things honest:

  • One worker per target at a time. Apps on the same target share storage, the Podman socket, and the systemd user session — running two deploys against the same target at once would trip over each other. The runner serializes per-target work automatically.
  • Different targets run in parallel. Nothing stops your media deploy and your monitoring deploy from running simultaneously, and the runner takes full advantage.

Singleton nodes (CALLBACK, DNS) have target=None and run concurrently with anything they’re not blocked on.

What Happens When Something Fails?#

Old pipeline: one broken app froze the whole integration phase for everyone.

DAG runner: a failure is contained to what actually depends on it. If your Sonarr deploy fails:

  • Sonarr’s RECONCILE node is blocked (it explicitly depended on its own DEPLOY) — correct
  • Prowlarr’s RECONCILE is blocked (it listed requires: [sonarr]) — correct
  • Jellyfin’s DEPLOY runs anyway — it never depended on Sonarr
  • Traefik’s SYNC runs anyway — convention files are rendered on the workstation before the plan even starts, so Traefik’s routing table is still valid (with the previous Sonarr route, if any)
  • DNS runs anyway — IPs come from the state, not from this run

Your failure blast radius is now “Sonarr and everything that explicitly said it depends on Sonarr”, not “the whole integration phase.”

Neighborhood Pruning: Don’t Rebuild What Didn’t Change#

When you push a commit that only touches one app, convergence doesn’t need to run every single node. The plan builder takes a set of changed apps and prunes the plan to the integration neighborhood — the changed apps plus any app whose meta.yml lists them under integrations.

Concrete example. You edit services/sonarr/service.yml and push:

  1. Changed apps = {sonarr}
  2. Neighborhood = {sonarr} + any app that declares integrations: [..., sonarr, ...] — say {sonarr, prowlarr} (Prowlarr lists Sonarr as an integration because it registers Sonarr on Sonarr’s behalf… or similar)
  3. The plan keeps DEPLOY/RECONCILE nodes for those two apps, drops the rest, and runs a much smaller graph

Singletons (CALLBACK, DNS) survive the prune only if they still have something upstream that survived too. The result is a plan focused exactly on the work that needs to happen — nothing more, nothing less.

This is what lets convergence stay cheap even when your self-hosted solution grows to thirty apps.

Where Does This Fit in Convergence?#

The execution plan runs inside Phase 5: Deploy of convergence. The earlier phases (reconcile, infrastructure, prepare, teardown) still do their thing in order — they’re about getting targets into the right shape, not about running app-level work. Once they’re done, convergence hands the project graph to the plan builder and the runner takes over.

So the flow is:

git push
  ↓
Convergence triggered
  ↓
Phase 1-4: reconcile / infra / prepare / teardown (linear)
  ↓
Phase 5: DEPLOY — build ExecutionPlan, run it (DAG)
  │
  ├── DEPLOY(postgres)
  │     ↓
  ├── DEPLOY(sonarr) ──┐
  ├── DEPLOY(prowlarr) ┤         SYNC(traefik) ──┐
  ├── DEPLOY(radarr) ──┼──── … ── SYNC(authelia) ┼── RECONCILE(sonarr)
  ├── DEPLOY(traefik) ─┤         SYNC(homepage) ─┘       ↓
  ├── CALLBACK ────────┘                              RECONCILE(prowlarr)
  │                                                      ↓
  └── DNS (runs whenever ready)                        done
  ↓
Phase 6: finalize (linear)

The shape of the DAG changes every run based on what’s in your meta.yml files and what changed since the last commit. That’s the whole point — the ordering rules aren’t hardcoded anywhere, they emerge from the apps themselves.

Safety Valves#

Two escape hatches, both env vars you’d only flip while debugging:

VarEffect
PSW_MAX_PARALLELISMCap how many nodes can run at once. Default is plenty; set to 1 to serialize everything while chasing a flaky interaction
PSW_RECONCILE_EAGERLet RECONCILE nodes start the moment their explicitly-required upstream barriers clear, instead of waiting for every aggregator SYNC to finish. Faster, slightly less conservative — default off until every reconciler has honest requires

You should almost never need these. If you do, it usually means an app’s meta.yml is missing a requires edge that the runner is correctly respecting with the safe defaults — fix the metadata, don’t flip the flag.

Key Ideas#

  • A plan, not a pipeline — work is a graph of nodes, not a fixed sequence of phases
  • Metadata-driven edges — every “X waits for Y” comes from a declaration in an app’s meta.yml , never from PSW code
  • Parallel by default, serial where it matters — different targets run at once; same-target work serializes automatically
  • Failures are contained — one broken app blocks only its real downstream, not the whole integration phase
  • Small plans when little changesneighborhood pruning keeps convergence cheap as your setup grows
  • Cycles fail loudly — if metadata ever describes an impossible ordering, the plan refuses to build at all instead of deadlocking halfway through