oranki.net

The prettiest blog on the block


Self Hosting My Way - Generic layout

2nd post in a series about self-hosting

The layout

I like to keep all the relevant files for each respective service under a single filesystem tree. The first obvious reason is keeping things organized, but since the underlying file system is ZFS, it allows to snapshot the whole tree in one go, as the directory is a ZFS dataset.

For example, this is the layout I have for Gitea, let’s say it’s at /tank/services/gitea:

.
├── .borg.env       # Environment file for Borg, owned by root
├── data            # Bind-mounted data for Gitea
└── quadlet         # Quadlet definitions

To generalize the layout, there’s the following:

  • data

    One or more directories that are mounted inside the containers

  • quadlet

    The YAML files for Podman pods and Quadlet systemd unit files

  • other files

    The root directory of a service can also contain other things, for example reverse proxy config files, utility and backup scripts etc.

There’s nothing really original in doing it this way, but I see a lot of folks keeping all the persistent data for containers in e.g. /opt/data and docker-compose files in e.g. /opt/docker-compose. Doing it like this makes it easier to take one of the services and run it elsewhere.

I like to keep databases’ (postgres, mysql, redis) persistent data in named volumes instead of bind mounts, but they could be directories under the same dataset as well. It’s an easy way to have db data on a separate disk. During backups, the db data volumes are exported and an SQL dump done just in case too.

Running a service

All the services run as unprivileged services, they are managed by systemd in concert with quadlet. This means the first thing is to put the unit files in $HOME/.config/containers/systemd of a user.

Copying the quadlet files would mean needing to manually keep the ones in the service file tree up to date, but luckily Quadlet reads symlinks. Instead of copying, the necessary files can be symlinked from the service directory. I’ll use Gitea here as an example:

mkdir -p $HOME/.config/containers/systemd/gitea
cd $HOME/.config/containers/systemd/gitea
cp -s /tank/services/gitea/quadlet/* .

I’ll leave the details of Quadlet and Podman pod in later posts, covering individual services. Next, since we already have the data folders created, all that’s left to do is

systemctl --user daemon-reload
systemctl --user start pod-gitea.service

There’s of course more things to do here, like most likely configuring a reverse proxy, but that’s all it takes. Container images are automatically updated to the latest versions (don’t use the latest tag!), pod created and containers started.

Backups

One of the biggest advantages here is the simplicity of backups. To backup a service, the procedure I use for everything is the following:

  1. Stop or pause the services, excluding a possible database
  2. Dump the database
  3. Stop the database and export the underlying volume
  4. Snapshot the dataset
  5. Restart the service
  6. Back up the service data using the snapshot as source

For consistency of the service data, the services themselves are stopped prior to dumping the database. This is just to make sure the db and files are in a consistent state.

Using a filesystem with snapshot capability, like ZFS, keeps the downtime very short. In practice, the procedure results in less than a minute’s outage, even if the amount of data is large. I don’t have that much to backup, around 200G in total, but on a bad day the off-site backup run can take up to an hour, despite being incremental. When using the snapshot as source for the remote backup instead of the live data, it doesn’t really matter how long the run takes. Assuming it takes less than 24 hours, which is when the next run will start.

In addition I use ZFS send/receive to keep a copy of the datasets on a local, different host. This is for cases like a catastrophic disk failure, as downloading that 200G from the internet is a tedious process. Not to mention if the amount of data was in the terabytes or even more.

Conclusion

By keeping the service data and configuration in a single place, the entire service becomes very portable. I haven’t been using Quadlet for long, but there’s also no need for any special gimmicks, so the setup is quite robust and depends basically only on OS-provided packages. Downtime stays minimal by utilizing snapshots.

The next post will cover a real service in more detail, starting with Home Assistant.

Updated on : Added link to next series post