Skip to main content

Backup and Restore

Creating Snapshots

Snapshots are enabled by default.

The snapshot directory defaults to /var/lib/rancher/rke2/server/db/snapshots.

To configure the snapshot interval or the number of retained snapshots, refer to the options.

Snapshots are stored on each etcd node. If you have multiple etcd or etcd + control-plane nodes, you will have multiple copies of local etcd snapshots.

You can take a snapshot manually while running with the etcd-snapshot subcommand. For example: rke2 etcd-snapshot save --name pre-upgrade-snapshot.

Restoring a Snapshot to Existing Nodes

When RKE2 is restored from backup, the old data directory will be moved to /var/lib/rancher/rke2/server/db/etcd-old-%date%/. RKE2 will then attempt to restore the snapshot by creating a new data directory and start etcd with a new RKE2 cluster with one etcd member.

  1. You must stop RKE2 service on all server nodes if it is enabled via systemd. Use the following command to do so:
systemctl stop rke2-server
  1. Next, you will initiate the restore from snapshot on the first server node with the following commands:
rke2 server \
--cluster-reset \
--cluster-reset-restore-path=<PATH-TO-SNAPSHOT>
  1. Once the restore process is complete, start the rke2-server service on the first server node as follows:
systemctl start rke2-server
  1. Remove the rke2 db directory on the other server nodes as follows:
rm -rf /var/lib/rancher/rke2/server/db
  1. Start the rke2-server service on other server nodes with the following command:
systemctl start rke2-server

Result: After a successful restore, a message in the logs says that etcd is running, and RKE2 can be restarted without the flags. Start RKE2 again, and it should run successfully and be restored from the specified snapshot.

When rke2 resets the cluster, it creates an empty file at /var/lib/rancher/rke2/server/db/reset-flag. This file is harmless to leave in place, but must be removed in order to perform subsequent resets or restores. This file is deleted when rke2 starts normally.

Restoring a Snapshot to New Nodes

  1. Back up the token server: /var/lib/rancher/rke2/server/token in case you will not use the same one. Token server is used to decrypt the bootstrap data inside the snapshot

  2. Stop RKE2 service on all server nodes if it is enabled and initiate the restore from snapshot on the first server node with the following commands:

systemctl stop rke2-server
rke2 server \
--cluster-reset \
--cluster-reset-restore-path=<PATH-TO-SNAPSHOT>
--token=<BACKED-UP-TOKEN-VALUE>
  1. Once the restore process is complete, start the rke2-server service on the first server node as follows:
systemctl start rke2-server
warning

The node where the snapshot was taken will appear as NotReady

Other Notes on Restoring a Snapshot

  • When performing a restore from backup, users do not need to restore a snapshot using the same version of RKE2 with which the snapshot was created. Users may restore using a more recent version. Be aware when changing versions at restore which etcd version is in use.

  • By default, snapshots are enabled and are scheduled to be taken every 12 hours. The snapshots are written to ${data-dir}/server/db/snapshots with the default ${data-dir} being /var/lib/rancher/rke2.

List Snapshots

You can list local snapshots with the etcd-snapshot ls subcommand.

Prune Snapshots

Snapshots are pruned automatically when the number of snapshots exceeds the configured retention count. The oldest snapshots are removed first.

You can manually prune "on-demand" snapshots down to a smaller amount using the following command:

rke2 etcd-snapshot prune --etcd-snapshot-retention <NUM-OF-SNAPSHOTS-TO-RETAIN>

You can manually prune "scheduled" snapshots down to a smaller amount using the following command:

rke2 etcd-snapshot prune --name etcd-snapshot --etcd-snapshot-retention <NUM-OF-SNAPSHOTS-TO-RETAIN>