Disaster Recovery#

What It Is#

Disaster recovery is the answer to one question: “My server burned down — can I bring everything back?”

PSW’s answer is yes, and the way it gets to “yes” is by treating your setup as three independent things, kept in three different places, restored in three different ways. Lose one and the other two can usually carry you. Lose all three and you’re rebuilding from memory.

This doc is the umbrella — what the three things are, where each one lives, and how they fit together when you actually need them. The deep details live in two sister docs:

  • secrets.md — the encryption side (the age key , the encrypted secrets/*.yml files, the psw project export-recovery panic backup).
  • backups.md — the application-data side (per-app bundles , transports, restore plumbing).

If you only read one section here, read The day disaster strikes — step by step below.


The “house on fire” picture#

Imagine your PSW project is your house. There are three things you need to protect, and they have completely different shapes:

1. The blueprints#

What rooms you have, where the doors are, how the wiring runs. The shape of the project. In PSW that’s project.yml, network.yml, deployment-plan.yml, services/, roles/ — all plain-text files.

Where they live: in your off-site git remote. GitHub , Forgejo , wherever — every commit you push backs them up. The encrypted secrets/*.yml files ride along with the blueprints in the same git repo (they’re already ciphertext, so they’re safe to commit).

How you make this backup: by git push-ing your project, the same way any developer does. PSW doesn’t need a special command for it.

2. The valuables#

Your jewelry, your passports, the spare key to your car. Small, sensitive, irreplaceable. In PSW that’s the age key — the master decryption key that turns your encrypted SOPS files back into readable Cloudflare tokens, OPNsense credentials, SMTP passwords, the Proxmox root password baked into the auto-installer, and so on.

Where they live: in your panic-backup tarball, plus optionally in any other safe place (USB stick, password manager, safety-deposit box).

How you make this backup:

psw project export-recovery
# → ~/psw-recovery/<project>-<utc-iso-timestamp>.tar.gz

That command bundles the age key + every encrypted file under secrets/ into one timestamped envelope. Stash it on a USB stick, an encrypted cloud drive, your password manager, your safety-deposit box. Run the command again whenever a credential changes. (Full story in secrets.md § The panic backup .)

3. The papers in the filing cabinet#

Your actual stuff. Vaultwarden’s vault entries, Forgejo’s repos, Immich’s photos, Frigate’s camera footage, Home Assistant’s automation history. The contents your apps exist to hold.

Where they live: in per-app bundles, made by psw bundle export <app>. One bundle per app you care about. Different apps produce wildly different sizes (Vaultwarden ~50 KB, Immich potentially terabytes), so they’re stored independently.

How you make this backup:

psw bundle export vaultwarden                                      # default destination: <project>/bundles/...
psw bundle export jellyfin --transport=tarball --to=/mnt/usb/jf.tar.gz
psw bundle export postgres --transport=tarball --to=/mnt/usb/pg.tar.gz

Pick which apps matter. Pick where each bundle lands. (Full story in backups.md § What Are They? .)


Why three pieces, not one#

You might wonder: why not just psw export-everything and put the whole thing in one place?

Because the three pieces have completely different needs:

Blueprints (git)Valuables (panic backup)Papers (bundles)
SizeA few KB of plain textA few KB of encrypted YAMLKB to TB depending on app
CadenceWhenever you change configWhenever you rotate a credentialConstantly (every Vaultwarden write, every camera frame)
Right storageGit remote (versioned, distributed)Encrypted USB / safety-deposit box (rare access, paranoid handling)Restic / S3 / NAS (versioned, deduped, frequent writes)
Threat model“Lose my workstation” — git remote saves you“Lose my workstation AND git” — panic backup saves you“Lose my apps’ data” — bundle restore saves you

If you collapsed them: backing up the photos every time you rotate a token is wasteful; backing up the Cloudflare token every time a security camera sees motion is wrong. Keeping them separate lets each piece use the right tool at the right cadence.

The other reason: separation of concerns also separates the threat models. If a thief grabs your panic backup off your desk, they have your secrets — but not your application data. If they steal your NAS, they have your photos — but not your master encryption key. Splitting the artifacts makes “one stolen thing” survivable.


The day disaster strikes — step by step#

You walk into the server room. It’s on fire. Or your hard drive died. Or your cluster melted. The sequence:

Step 1: Get the project shape back#

git clone <off-site-repo-url> ~/casaeureka

This brings back your project.yml, network.yml, deployment-plan.yml, services/, roles/, .sops.yaml, AND the encrypted secrets/*.yml files (they’re in git as ciphertext). The only thing missing is the age key, because that’s gitignored.

Step 2: Get the secrets unlocked#

You have two paths here, depending on how your house burned down. Pick whichever applies:

Path A — “I just need the key back” (most common)

You have your panic backup off-box. Drop the age key in:

# Extract just the age-key file from your panic backup.
tar -xzf ~/usb/panic-backup.tar.gz recovery/age-key
cp recovery/age-key ~/casaeureka/secrets/.age-key
chmod 0600 ~/casaeureka/secrets/.age-key

Open the wizard:

psw wizard --project ~/casaeureka

The wizard sees you’ve got project.yml + network.yml + secrets/*.yml already (from git) and routes you straight past Start to wherever you left off — typically Hardware → Plan → Install → Launch. No credential prompts appear, because PSW reads the existing encrypted files directly. Same Cloudflare token, same OPNsense URL, same SSO admin password.

A wizard UI that does this in one click is planned at docs/plans/wizard-key-recovery/ — same panic backup, but the wizard auto-detects the “git-cloned project missing its key” state and drops the age key in for you. Until that ships, the 3-line cp recipe above is what you use.

Path B — “I want a fresh project that inherits these settings”

Maybe you don’t want to bring the same project back. Maybe you’re doing a staging clone of production. Maybe you’re migrating to a new domain. Maybe you’re splitting one project into two. In that case you’d skip the git clone and instead start a fresh project:

psw wizard --project ~/casaeureka-staging

On the wizard’s Start screen, click the “Restore from a previous project” card. Pick your panic-backup tarball. The wizard validates it, drops every secret in, and fills the Start form with the inherited credentials. You review the form (e.g. change the domain if you’re migrating), hit Submit, continue Hardware → Plan → Install → Launch the same as a brand-new project. (Full walkthrough: secrets.md § Scenario B .)

Step 3: Run bootstrap#

In both paths above, the wizard ends at Launch — that runs psw deploy bootstrap. Fresh Proxmox install on bare hardware, fresh LXCs, fresh ZFS datasets, fresh PostgreSQL — but wired to the old identity because the secrets came back. New Vaultwarden has zero items in its database, but the database password it uses matches the old one.

Step 4: Restore each app’s data#

The new infrastructure is up. Now you bring back each app’s data, one by one, from the bundles you saved earlier:

# (Optional — recommended) Verify the bundle is what you think it
# is BEFORE importing.  Shows the manifest (app name, bundle id,
# what kind of data, file count + sizes) without changing any state.
psw bundle inspect /mnt/usb/vw.tar.gz

# Then import.
psw bundle import vaultwarden --transport=tarball --from=/mnt/usb/vw.tar.gz
psw bundle import forgejo     --transport=tarball --from=/mnt/usb/forgejo.tar.gz
psw bundle import postgres    --transport=tarball --from=/mnt/usb/pg.tar.gz
# ... and so on for each app you care about

psw bundle inspect is the cheap pre-flight check — catches “wrong tarball” or “corrupted file” before you start tearing the running app down to load it. It’s read-only; you can run it as many times as you want without touching anything.

Each import call:

  1. Validates the bundle’s manifest + every file’s sha256. Refuses with a typed error if anything’s off — wrong app name, schema version mismatch, corrupted byte. No silent partial restore.
  2. Stops the running app on the destination LXC.
  3. Loads the dump into the database (or whatever the app’s restore plumbing needs).
  4. Drops side-files — Vaultwarden’s rsa_key.pem, Forgejo’s repo store, etc. — into the container’s data directory.
  5. Restarts the app and waits for it to become healthy.

Open https://vault.<your-domain> and sign in with your old master password. Your old vault entries decrypt and appear, because the same encryption key is back in place. The disaster recovery is complete.

(Full per-app details: backups.md § What Gets Backed Up, Really? .)


What you need to actually have all three#

Disaster recovery is only as strong as your habits. To survive a real fire, you need:

LayerMinimum habitRecommended habit
Blueprints (git)git push after every changeOff-site git remote (GitHub / Forgejo) — not the same machine your project lives on
Valuables (panic backup)One panic backup, somewhere safe, made at least onceRefresh whenever a credential changes; keep on encrypted storage that survives the workstation it came from
Papers (bundles)One bundle per app you care about, somewhere safeScheduled bundles via Backrest (planned) for apps that change constantly; manual tarball exports for static-ish data

One layer is not enough:

  • Blueprints alone? Nothing decrypts. You’d recreate every account from scratch.
  • Panic backup alone? You don’t know what apps you had, what targets, what your network looked like.
  • App bundles alone? Restored Vaultwarden’s database is full of ciphertexts you can’t decrypt without the rsa_key — and even though the rsa_key is in the bundle, the postgres credentials needed to even read the database are gone.

Three together = real disaster recovery.

Partial recoveries — what you can salvage#

Real life is messier than “all three or nothing”. Here’s what each partial state actually buys you:

What you haveWhat you loseWhat you can still do
Just git remoteEvery credential, every app’s dataWalk the wizard’s full Start form retyping every credential from scratch (Cloudflare, OPNsense, SMTP). Every auto-generated app password gets a brand new value. Treat your old infrastructure as gone — sign in fresh, repopulate Vaultwarden by hand from your password-manager backup, etc.
Just panic backup (no git, no bundles)Project shape, app dataUse the wizard’s “Restore from a previous project” card to seed a fresh project with your old credentials (Scenario B). You’ll have to re-decide which apps to install, what targets to create, what your network looks like — the panic backup doesn’t carry that.
Just app bundles (no git, no panic backup)Everything you’d need to use the bundlesEffectively nothing. The bundles’ data is encrypted with keys / DB credentials that came from the SOPS files in the panic backup. Without those, the bundles’ encrypted contents can’t be read. This is the “backup that didn’t survive its own protection” failure mode.
Git + panic backup (no app bundles)Everything users wrote into the appsA fully functional but empty platform. Vaultwarden has zero items, Forgejo has zero repos, Frigate has no past clips. Everything works, you just start from blank.
Git + bundles (no panic backup, no separate age key)Same as “just git remote” if you can’t decrypt the SOPS filesIf you have the age key saved elsewhere (password-manager note, USB stick), drop it in and you’ve recovered everything. If not, you’re back to “retype every credential from the wizard’s Start form”.
Panic backup + bundles (no git)Project shapeSame as “just panic backup” plus you have application data ready. Walk through Scenario B, pick the same apps as before, then import bundles.
All threeNothingFull recovery, same identity, same data.

The pattern: the panic backup OR a separately-saved age key is the lynchpin. Without one of them, the encrypted SOPS files in your git remote are dead weight. Make backing one of them up your highest discipline.


Where each piece should live#

Off-box, off-machine, off-fate-sharing. The rule of thumb:

LayerBad locationGood location
Git remoteThe same workstationGitHub, Forgejo on a hosted box, a self-hosted Forgejo on a different machine than your PSW one
Panic backup/tmp (volatile) — your PSW project’s own folder (gets wiped on reset)USB stick in a drawer, encrypted cloud drive, password-manager attachment, safety-deposit box
App bundlesThe same LXC the app runs on (local_dir default — convenient for testing, useless for real disaster)A tarball on encrypted off-site storage (USB / NAS / S3), or a Restic repo on Backblaze B2 / Wasabi / SFTP

The principle: a backup that shares a fate with the thing it’s backing up is not a backup. If the workstation that has your panic backup also has your project, both die together when the workstation dies.

A note on what’s encrypted vs. plaintext. Not all your backups are encrypted at rest in the same way:

  • The panic backup is not encrypted — it contains your age key in plaintext alongside the encrypted SOPS files. Anyone who steals the tarball gets the master key. Treat it like your SSH private key: encrypted USB, password-manager attachment, safety-deposit box. Never on a public web server, never in a cloud bucket without operator-controlled encryption.
  • App bundles via tarball or local_dir transport are plaintext on disk — the bundle contains things like a Postgres dump or a Vaultwarden db.sqlite3. Without the SOPS-encrypted credentials these are useless on their own (Vaultwarden’s vault entries are doubly-encrypted, you’d still need the master password). But other apps’ DB contents are readable as-is. Same storage discipline applies — pick a destination you’d trust with the underlying data.
  • App bundles via restic transport are encrypted at rest with a SOPS-managed password. That’s the right transport for “I want to put my backup on Backblaze B2 without trusting B2”.

When in doubt, encrypt the storage layer (LUKS volume on a USB stick, encrypted cloud drive). PSW’s defaults work either way; the hygiene is on you.


The two scenarios in one mental shortcut#

Whenever you’re trying to recover, ask yourself: “do I have my git remote available?”

  • YesScenario A . git clone + age key + run wizard. Same project comes back, same identity, same name.
  • NoScenario B . Fresh wizard, click Restore, inherit settings into a new project (possibly with a different name / domain).

Both paths are fed by the same panic-backup tarball. They differ in what’s already on disk on the receiving end (Scenario A has the git checkout; Scenario B starts empty).

In both cases, restoring app data is the samepsw bundle import <app> for each app you care about, after bootstrap completes.


What’s NOT covered by disaster recovery#

  • Live data between the last backup and the disaster. If you backed up Vaultwarden last Sunday and added an item on Wednesday, then your hard drive died on Friday — the Wednesday item is gone. PSW does not (yet) do continuous replication. Schedule your backups based on how much loss you can tolerate.
  • Application-level corruption. If your data was already corrupt at backup time, restoring the backup brings the corruption back. PSW’s manifest validation catches transport-level corruption (a flipped bit during file copy) but can’t tell whether the data was logically right when it was backed up.
  • Hardware that wasn’t there. If your old setup had three servers and you’re rebuilding with one, PSW’s plan validation will complain. Adjust the deployment plan first; recovery isn’t a magic right-sizer.
  • Subscriptions and external state. Cloudflare account, Hostinger VPS, OPNsense API key — those are external services. PSW restores the credentials it has stored, but if you lost your Cloudflare account itself, the credentials don’t help. Keep your account-level recovery info (TOTP backup codes, account recovery email access) somewhere durable too.

Testing it for real#

The repo has a runnable disaster-recovery test on real hardware: docs/testing/disaster-recovery.md . Walks through making the panic backup, wiping a real project + its infrastructure, restoring through the wizard, and verifying Vaultwarden’s items + master password round-trip correctly. Run it once before you actually need it — the time to discover a missing step in your own backup habits is not during the actual fire.

The synthesised version of the same test runs in CI on every PR (test_recovery_proving.py ). That’s how PSW guarantees the round-trip property doesn’t silently regress.


Key Ideas#

  • Three independent layers. Project shape (git), secrets (age key + encrypted SOPS files), application data (per-app bundles). Each gets its own backup, its own storage, its own restore command.
  • The “panic backup” tarball is a single command (psw project export-recovery) you run on healthy days; it covers the secrets layer.
  • The “panic backup” tarball is the SAME artifact for both recovery scenarios — what differs is what’s on disk on the receiving end.
  • Application data is per-app, on purpose. Different apps have different sizes, cadences, restore plumbing. PSW abstracts the differences with bundles but doesn’t pretend they’re all the same.
  • Restoration is loud-and-typed. Every refusal path raises a clear, operator-actionable error — never a deep mid-bootstrap surprise.
  • Off-box storage is the operator’s call. PSW packs and validates; where the artifacts live is your decision and your discipline.

For the deep dives: