How to allow an LXC container snapshot a mounted ZFS volume in PVE
A word about backups
I used to run my homeserver with EL8 on bare metal, but switched to Proxmox VE to widen the possibilities I can do with only a single box. I have utilized ZFS as the underlying file system for data for some time, and the robustness and simplicity of ZFS tooling is really nice.
One of the main advantages of storing e.g. mounted data from containers on ZFS is the ability to snapshot the data when taking backups. With filesystems without the possibility, backing up for example a Nextcloud instance requires a lengthy downtime, because in order to assure that the backup data is consistent, the database operations need to be paused for the duration of sending the data to the backup location, local or remote.
So without snapshotting, the procedure might look like the following:
- stop the service
- wait a bit for all the ongoing operations to finish (if stopping the service doesn’t just kill everything)
- possibly dump the database(s)
- copy the data to a backup location
- after the transfer is finished, restart the service
Obviously, with a lot of data, the time the service needs to be paused or stopped can be quite long. That’s where snapshots come in handy. With snapshot, the same procedure changes slightly to:
- stop the service
- wait a bit
- dump the database(s)
- snapshot the volume
- restart the service
- copy the data from the snapshot to the backup location
The difference might seem insignificant, but if you are sending the data to a remote location, the transfer will quickly start taking a long time. Without snapshots, you have to wait for the transfer to complete to assure consistent data, but since snapshots are instant and atomic, the lengthy transfer operation can be performed in the background after the service has already resumed normal operation.
For my personal and very small Nextcloud instance, for example, this reduces the downtime displaying the maintenance mode page from 10-15 minutes to a little over a minute. That minute is also mostly caused by an artificial wait of 60 seconds to wait possible sync transactions to finish.
Downside of ZFS backend in PVE
I am still unsure if using Proxmox VE is not overkill for my homelab usage, but that can maybe be discussed in another post. The setup with Rocky Linux 8 and ZFS was simple, and everything worked for a long time. Being the avid tinkerer I am I switched to Proxmox, and while great, there’s one huge downside to using ZFS storage in Proxmox. It basically makes it useless, or at least severely hindered to utilize Copy-on-Write filesystems, specifically ZFS on the guest systems. Btrfs is also a CoW filesystem, but it’s memory footprint is considerably smaller so it’s more usable, however I suspect the performance of two CoW filesystems on top of each other has quite the negative effect on performance.
After switching to Proxmox, I’m running Podman containers in a privileged Debian 11 LXC container. Previously on the bare-metal EL8 setup, I used only rootless Podman containers, and relied heavily on ZFS features for backing up the containers’ data like described above. On the PVE setup, this is somewhat harder to accomplish. The container data is still stored in ZFS datasets, one for each container, but since those datasets are on the PVE host, the LXC guest has no access to manipulate the ZFS filesystem. So all my previous backup scripts won’t work on this setup.
Switching back to the non-snapshot way of performing backups felt like a bad option, and for some time I performed the backup scripts on the PVE host. This also felt like a bad solution, the PVE host should stick to just being a hypervisor in my opinion, the same reason why I’m not running Podman on the PVE host directly.
A solution
After some research in the matter, I had the idea of utilizing ZFS permission system and SSH for allowing the LXC guest to trigger ZFS snapshots on the mounted datasets. There are a couple of things that need to be done.
Create an unprivileged user on the PVE host
- the user should be allowed to SSH in with key authentication
- for security purposes, no
sudo
or other permissions should be given to the user
Allow the user to snapshot the required dataset
zfs allow user snapshot tank/dataset
- do this for all the child datasets as well
In the LXC container, create an SSH key for the user who will be performing the snapshotting and copy it to the PVE user’s
~/.ssh/authorized_keys
Allow that key to only perform zfs snapshots on the PVE host, no shell:
- In the user’s
~/.ssh/authorized_keys
, prepend the key in question with the following:command="/usr/sbin/zfs snapshot -r tank/dataset@$SSH_ORIGINAL_COMMAND",no-port-forwarding,no-x11-forwarding,no-agent-forwarding
- Everything related to that key should be on one line, so the line will look
like:
command="/usr/sbin/zfs snapshot -r tank/dataset@$SSH_ORIGINAL_COMMAND",no-port-forwarding,no-x11-forwarding,no-agent-forwarding,no-agent-forwarding ssh-ed25519 .... user@lxc-guest-name`
- In the user’s
On the LXC guest, create a wrapper function named zfs() to the
~/.bashrc
or equivalent file for the user taking the snapshots:zfs() { args=( $@ ) ssh user@pve-host ${args[-1]##*@} }
There’s no need necessarily to make this a globally available funtion, you could just add that to the beginning of the backup script so it’s only available for the script.
Explanation
Ok, the above steps are done, what do they actually do?
There’s now an unprivileged user on the PVE host, who is only allowed to execute the
zfs snapshot -r tank/dataset@$SSH_ORIGINAL_COMMAND
-command, and that command is executed each time the user logs in.$SSH_ORIGINAL_COMMAND
is a special variable, and in this case it means that if the user were to executessh user@pve-host test
from a remote machine, it would triggerzfs snapshot -r tank/dataset@test
on the PVE host.- If you would try to just SSH in to gain a shell, this would result in
zfs snapshot -r tank/dataset@
, which results in an error
The
zfs()
function on the LXC guest is mostly for convenience to keep the original backup scripts as unaltered as possible. It takes the arguments given to thezfs
command and strips out everything except the part after@
. So when you executezfs snapshot -r tank/dataset@snapshot-2022-04-24
, what is actually executed isssh user@pve-host snapshot-2022-04-24
.
This implementation is obviously restricted to only a single dataset on purpose,
to keep the LXC guest with the least possible access.
If there are multiple separate datasets you’d want to snapshot, the authorized_keys
line could be altered to
command="/usr/sbin/zfs $SSH_ORIGINAL_COMMAND",no-port-forwarding,no-x11-forwarding,no-agent-forwarding
and furthermore, the zfs()
function to
zfs() {
args=( $@ )
ssh user@pve-host "${args[@]:1}"
}
This would make zfs snapshot -r tank2/another_dataset@snapshot
in the LXC guest
to execute ssh user@pve-host snapshot -r tank2/another_dataset@snapshot
, which
results in zfs snapshot -r tank2/another_dataset@snapshot
on the PVE host.