Backup/restore solutions for storage
CloudNativePG (Postgres)
Backups are natively supported by CloudNativePG using Barman (https://pgbarman.org/), which creates backup files and WAL archives. Backups save the entire cluster, whereas WAL archives are updates of the cluster. Backups are made regularly with a CRON job. To restore a cluster, first, a full backup is restored to bring the cluster back to its exact state at the time of the backup. Then, WAL (Write-Ahead Logging) is replayed to recover transactions applied after the backup, allowing for point-in-time recovery (PITR) to a specific moment, preventing the replay of errors if needed.
Create a MinIO bucket to store backups and WAL archives
- Go to the MinIO console (look at S3 ingress on Kubernete to get the URL, for example for PPROD : https://s300-console.technique.artemis/buckets/add-bucket). Connect using S3_USERNAME/S3_PASSWORD.
- Give the name you want for the bucket BUCKET_NAME, and click on "Create Bucket".
Configure scheduled backups in the Helm values
- Add this part to the Helm values file (if it's not already present) :
backups:
enabled: true
endpointURL: "http://s3.pf-dna--s300.svc.cluster.local:9000" # Adapt to s3 service and namespace : "http://SERVICE_NAME.NAMESPACE.svc.cluster.local:SERVICE_PORT"
# -- Specifies a CA bundle to validate a privately signed certificate.
endpointCA:
# -- Creates a secret with the given value if true, otherwise uses an existing secret.
create: false
name: ""
key: ""
value: ""
destinationPath: ""
provider: s3
s3:
bucket: "BUCKET_NAME" # The name of the bucket previously created
path: "/PATH_NAME" # For example "/v1", or "/v2" if the cluster has been recovered once, etc.
accessKey: "S3_USERNAME"
secretKey: "S3_PASSWORD"
secret:
# -- Whether to create a secret for the backup credentials
create: true
# -- Name of the backup credentials secret
name: ""
wal:
# -- WAL compression method. One of `` (for no compression), `gzip`, `bzip2` or `snappy`.
compression: gzip
# -- Whether to instruct the storage provider to encrypt WAL files. One of `` (use the storage container default), `AES256` or `aws:kms`.
encryption: "" # For now, encryption does not work
# -- Number of WAL files to be archived or restored in parallel.
maxParallel: 1
data:
# -- Data compression method. One of `` (for no compression), `gzip`, `bzip2` or `snappy`.
compression: gzip
# -- Whether to instruct the storage provider to encrypt data files. One of `` (use the storage container default), `AES256` or `aws:kms`.
encryption: "" # For now, encryption does not work
# -- Number of data files to be archived or restored in parallel.
jobs: 2
scheduledBackups:
# A list of scheduled backup, here is an example of a daily backup scheduled at midnight, each backup is also made at the creation of the cluster by default
-
# -- Scheduled backup name
name: daily-backup
# -- Schedule in cron format
schedule: "0 0 0 * * *"
# -- Backup owner reference
backupOwnerReference: self
# -- Backup method, can be `barmanObjectStore` (default) or `volumeSnapshot`
method: barmanObjectStore
Restore a cluster
- Get the Helm values file the cluster you want to restore.
- Modify the value "mode" from "standalone" to "recovery".
- Add this "recovery" part to the cluster Helm values file :
recovery:
method: object_store
## -- Point in time recovery target. Specify one of the following:
pitrTarget:
# -- Time in RFC3339 format
time: ""
##
# -- Backup Recovery Method
backupName: "" # Optional : name of the backup to recover from, default to latest.
##
# -- The original cluster name when used in backups. Also known as serverName.
clusterName: "CLUSTER_NAME"
endpointURL: "http://s3.pf-dna--s300.svc.cluster.local:9000" # Adapt to s3 service and namespace : "http://SERVICE_NAME.NAMESPACE.svc.cluster.local:SERVICE_PORT"
# -- Specifies a CA bundle to validate a privately signed certificate.
endpointCA:
# -- Creates a secret with the given value if true, otherwise uses an existing secret.
create: false
name: ""
key: ""
value: ""
destinationPath: ""
provider: s3
# The same configuration you used in the backups section of the cluster you want to restore
s3:
bucket: "BUCKET_NAME
" # The name of the bucket previously created
path: "/PATH_NAME" # For example "/v1", or "/v2" if the cluster has been recovered once, etc.
accessKey: "S3_USERNAME"
secretKey: "S3_PASSWORD"
secret:
# -- Whether to create a secret for the backup credentials
create: true
# -- Name of the backup credentials secret
name: ""
- Modify the backups part to create another folder to store the backups of the restored cluster. At least the "path" value must be changed (The operator includes a safety check to ensure a cluster will not overwrite a storage bucket that contained information. A cluster that would overwrite existing storage will remain in state Setting up primary with Pods in an Error state), for example, if path was "/v1", now use "/v2". You can also decide to change the entire backups configuration to write backups and WAL archive in another bucket.
- Run :
helm upgrade --install HELM_RELEASE_NAME -f HELM_VALUES_FILE --namespace PG-FAT_NAMESPACE --create-namespace https://nexus.technique.artemis/repository/Artemis-Helm/cluster-0.1.0.tgz
MinIO (S3)
Clickhouse
Backups are made with Clickhouse-backup tool (https://github.com/Altinity/clickhouse-backup). A container is deployed in each Clickhouse replica, and runs a REST API on port 7171 where you can create and restore backups.
Create a MinIO bucket to store backups
- Go to the MinIO console (look at S3 ingress on Kubernete to get the URL, for example for Kosmos-dev : https://s3.kosmos-dev.athea/).
- Give the name you want for each bucket (metier and technique) METIER_BUCKET_NAME and TECHNIQUE_BUCKET_NAME, and click on "Create Bucket".
Configure backup parameters
Using Helmfile project :
For each Clickhouse cluster (technique or metier), configure s3 storage and options in the environment file (for example environments/kosmos-dev) like :
clickhouse:
technique:
...
backup:
enabled: true
allow_empty_backups: false
full_interval: "24h"
watch_interval: "1h"
backups_to_keep_local: "-1"
backups_to_keep_remote: "0"
s3:
bucket: "TECHNIQUE_BUCKET_NAME"
path: "PATH"
metier:
...
backup:
enabled: true
allow_empty_backups: false
full_interval: "24h"
watch_interval: "1h"
backups_to_keep_local: "-1"
backups_to_keep_remote: "0"
s3:
bucket: "METIER_BUCKET_NAME"
path: "PATH"
- allow_empty_backups: (default is false) : allow empty backups (if false and nothing changed between 2 backups, no backup is created)
- backups_to_keep_local (default is -1) : how many latest local backup should be kept, 0 means all created backups will be stored on local disk, -1 means backup will keep after
createbut will delete aftercreate_remotecommand. - backups_to_keep_remote (default is 0) : how many latest backup should be kept on remote storage, 0 means all uploaded backups will be stored on remote storage.
- watch_interval (default is 1h) : use only for
watchcommand, incremental backup will create every X hour (for example WATCH_INTERVAL=1h) - full_interval (default is 24h) : use only for
watchcommand, full backup will create every X hour (for example FULL_INTERVAL=24h for daily full backup) - watch_is_main_process (default is false) : treats
watchcommand as a main api process, if it is stopped unexpectedly, api server is also stopped. Does not stop api server ifwatchcommand canceled by the user.
All the available parameters are listed here : https://github.com/Altinity/clickhouse-backup/blob/master/ReadMe.md#configurable-parameters. Some of them are not set in the Helm files.
To add a parameter to the backup container, add the corresponding env variables to the container named clickhouse-backup in file apps/clickhouse/clickhouse/templates/clickhouse-installation.yaml (l62).
Backup a cluster using API calls
To make calls to the API, you need to create a port-forward using :
kubectl port-forward CLICKHOUSE_POD_NAME -n CLICKHOUSE_NAMESPACE 7171:YOUR_LOCAL_PORT &
Then you can make API calls. To create a backup, you have 2 ways :
-
Use
watchcommand (https://github.com/Altinity/clickhouse-backup/blob/master/ReadMe.md#post-backupwatch) :watchcommand runs background watch process and creates full+incremental backups sequence regularly, using the intervals defined above.curl -s localhost:YOUR_LOCAL_PORT/backup/watch -X POST | jq .Optional string query argument
nameto specify a template for backup names, otherwise it will be generated. -
Use
createcommand (https://github.com/Altinity/clickhouse-backup/blob/master/ReadMe.md#post-backupcreate) :curl -s localhost:YOUR_LOCAL_PORT/backup/create -X POST | jq .Optional string query argument
watch_backup_name_templateto specify a name, otherwise it will be generated.
Restore a cluster using API calls
-
List available backups :
curl -s localhost:YOUR_LOCAL_PORT/backup/list | jq . -
Get the name of the backup you want to use.
-
If your backup is remote, download it :
curl -s localhost:YOUR_LOCAL_PORT/backup/download/BACKUP_NAME -X POST | jq . -
Restore
curl -s localhost:YOUR_LOCAL_PORT/backup/restore/BACKUP_NAME -X POST | jq .