środa, 9 grudnia 2020

Backup and restore your database running in OpenShift

In this post I'll show you how to implement backup solution for your database running as container in OpenShift. I'll use AWS S3 as storage for my backups and PostgreSQL database as an example.

First I've create following cron job in my OpenShift project which is executed daily at 2 AM:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: landmarks-db-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          activeDeadlineSeconds: 1800
          restartPolicy: OnFailure
          containers:
          - name: backup-maker
            image: quay.io/jstakun/pg-backup:0.2
            command:
            - /bin/bash
            - /opt/backup/do_backup.sh
            envFrom:
            - configMapRef:
                name: pg-backup-conf
            env:
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: landmarksdb
                  key: database-password
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: landmarksdb
                  key: database-user
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: aws-creds
                  key: AWS_ACCESS_KEY_ID
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: aws-creds
                  key: AWS_SECRET_ACCESS_KEY

This cron job is referencing aws-creds secret where I store my AWS S3 access keys, landmarksdb secret where I store my PostgreSQL database credentials and pg-backup-conf config map where I store other required env variables:

apiVersion: v1
kind: Secret
metadata:
  name: aws-creds
type: Opaque
data:
  AWS_ACCESS_KEY_ID: ...
  AWS_SECRET_ACCESS_KEY: ...

apiVersion: v1
kind: Secret
metadata:
  name: landmarksdb
type: Opaque
data:
  database-password: ...
  database-user: ...

apiVersion: v1
kind: ConfigMap
metadata:
  name: pg-backup-conf
data:
  AWS_DEFAULT_REGION: us-east-1
  DB_BACKUP_PASSWORD: ...
  POSTGRES_DB: landmarksdb
  POSTGRES_HOST: landmarksdb
  S3_BACKUP_PATH: s3://my_backups/landmarksdb

My cron job is using container image quay.io/jstakun/pg-backup:0.2 which contains all the tools I need to execute backup and store it at AWS S3 bucket: pg_dump, bzip2, mcrypt and aws cli. Here is pg-backup image definition:

FROM registry.access.redhat.com/rhscl/postgresql-96-rhel7

ENV BACKUP_HOME=/opt/backup

WORKDIR $BACKUP_HOME

USER root

RUN yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm && \
    yum install -y mcrypt bzip2 && \
    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
    unzip awscliv2.zip && \
    ./aws/install && \
    rm awscliv2.zip && \
    rm -rf ./aws && \
    yum clean all && \
    rm -rf /var/cache/yum && \
    mkdir -p $BACKUP_HOME && \ 
    adduser -g 0 -c "Backup user" -p backup backup 

COPY ./do_backup.sh $BACKUP_HOME

RUN chown -R backup:root $BACKUP_HOME && \
    chmod +x $BACKUP_HOME/do_backup.sh && \
    chmod g+rw $BACKUP_HOME

USER backup 

Here you can find the latest pg-backup container image definition. In this container image definition I'm referencing following do_backup.sh bash script: 

#!/bin/bash
cd /opt/backup
export DUMP_FILE=pg-backup_`date +%Y%m%d_%H%M%S`.sql
export PGPASSWORD=$POSTGRES_PASSWORD 
echo '--- running pg_dump ---'
pg_dump -v -d $POSTGRES_DB -U $POSTGRES_USER -h $POSTGRES_HOST --encoding=UTF-8 > $DUMP_FILE
echo '--- running bzip2 ---'
bzip2 $DUMP_FILE
echo '--- running mcrypt ---'
mcrypt ${DUMP_FILE}.bz2 -k $DB_BACKUP_PASSWORD
echo '--- running aws s3 ---'
aws s3 cp ${DUMP_FILE}.bz2.nc $S3_BACKUP_PATH/$DUMP_FILE.bz2.nc
Here you can find the latest backup script definition including AWS SES integration for sending email notificatons. 

In order to copy backup file to AWS S3 bucket you'll need to create AWS account and assign appropriate policy to it:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:AbortMultipartUpload",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::my-backups",
                "arn:aws:s3:::my-backups/*"
            ]
        }
    ]

In addition for optional AWS SES integration for sending email notifications following policy will be needed:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ses:SendEmail",
                "ses:SendRawEmail"
            ],
            "Resource": "arn:aws:ses:us-east-1:096851829459:identity/admin@mydomain.com"
        }
    ]
}

That's it! Now you can check backups are stored in your AWS S3 bucket daily.

In order to restore the database you can follow following procedure from your workstation:

1. Install AWS CLI 

2. Download backup file 

$ mkdir /data/install/landmarks && cd /data/install/landmarks

$ aws configure --profile=s3 

$ S3_BACKUP_PATH=s3://my-backups/landmarksdb

$ DUMP_FILE=pg-backup_20201203_141724.sql.bz2.nc

$ aws s3 cp $S3_BACKUP_PATH/$DUMP_FILE $DUMP_FILE

3. Install mcrypt and bzip2. You can check my container definition file above for commands how to install these tools from EPEL repository.

4. Decrypt and uncompress backup file

$ mcrypt -d $DUMP_FILE
#passphrase: ...

$ bzip2 -vd pg-backup_20201203_141724.sql.bz2

5. Copy backup file to destination database container

$ oc project landmarks

$ DB_POD=$(oc get pods | grep landmarksdb | awk '{print $1}') && echo $DB_POD

$ oc rsync /data/install/landmarks $DB_POD:/var/lib/pgsql/data

6. Import backup to the database

$ oc rsh $DB_POD

sh-4.2$ psql -d landmarksdb < /var/lib/pgsql/data/landmarks/pg-backup_20201203_141724.sql

sh-4.2$ rm -rf /var/lib/pgsql/data/landmarks/

In order to backup and restore data to other Database Management System you can follow pretty much the same procedure and simply replace pg_dump tool with other backup & restore tool suitable for your Database Management System.