Learn how to create a complete backup solution for your Open edX installation. This detailed step-by-step how-to guide covers backing up MySQL and MongoDB, organizing backup data into a single date-stamped tarball zip file, plus how to setup a cron job and how to copy your backups to an AWS S3 storage bucket.

Summary

The official Open edX documentation takes a laissez faire approach to many aspects of administration and support, including for example, how to properly backup and restore course and user data. This article attempts to fill that void. Also, don’t forget to bookmark my post on how to restore your Open edX platform from a backup. Implementing an effective backup solution for Open edX requires proficiency in a number of technologies, which is fine if you’re part of a full IT team at a major university, but can this can otherwise be a real obstacle to competently supporting your Open edX platform.

Open edX stores course data, including media uploads such as images and mp4 video files in MongoDB. To do this, MongoDB’s core functionality is extended with a technology called GridFS that provides an infinitely scalable file system for all course-related data. For student data Open edX takes a more relational approach with MySQL, noting however that the Open edX platform relies on several databases (see right-hand diagram). These are excellent architectural choices and both technologies are best-of-breed and getting better all the time. Nonetheless, having two entirely different persisting strategies under the hood really complicates simple IT management responsibilities like data backups.

We’re going to setup an automated daily backup procedure that backs up the complete contents of the MongoDB course database (including file/document, media and image uploads) and each individual MySQL database that contains learner user data. We’ll create a Bash script that combines these files into a single date-stamped linux tarball and then pushes this to an AWS S3 bucket for long-term remote storage.

Assumptions

  • Your Open edX instance is running from an AWS account
  • Your AWS EC2 instance is running on an Ubuntu 20.04 LTS server built from the Amazon Linux AMI
  • You have SSH access to your EC2 instance and sudo capability
  • You have permissions to create AWS IAM users and S3 resources
  • Your Open edX instance is substantially based on the guidelines published here: Open edX Step-By-Step Production Installation Guide

1. Create credentials for AWS CLI

AWS provides an integrated security management system called Identity and Access Management (IAM) that we are going to use in combination with AWS’ Command Line Interface (CLI) to give us a way to copy files from our local Ubuntu file system to an S3 bucket in our AWS account. If you’re new to either topic then it behooves you to follow these two links to learn the basics of both topics.

In this step, we’ll create a new IAM user with full access to S3 resources.

In the next step we’ll permanently store our “Access Key ID” and “Secret key ID” values in the AWS CLI configuration file on our Ubuntu server instance so that we’ll be able to seamlessly make calls to AWS S3 from the command line, and importantly, from a Bash script.

2. Install AWS CLI (Command Line Interface)

Next, lets install AWS’ command line tools. The version of Ubuntu that you should be using has Pip preloaded. We’ll use this to install AWS CLI.

$ pip install awscli --upgrade --user

after installing run the AWS “configure” program to permanently store your IAM user credentials

$ aws configure

The “configure” program adds a hidden directory to your home folder named .aws that contains a file named “config” and another named “credentials”; both of which you can edit. So just fyi, you can change your configuration values in the future either by re-running aws configure or by editing these two files.

If you need to troubleshoot, or heck, if you just want to know more then you can read AWS’ CLI installation instructions.

3. Create An AWS S3 Bucket

Amazon S3 is object storage built to store and retrieve any amount of data from anywhere – web sites and mobile apps, corporate applications, and data from IoT sensors or devices. It is designed to deliver 99.999999999% durability, and stores data for millions of applications used by market leaders in every industry. We’ll use S3 to archive our backup files. In my view S3 is a very stripped down file system that has been highly optimized for its specific use case.

Creating an S3 bucket is straightforward, and AWS’ step-by-step instructions are very good. There are no special configuration requirements for our use case as a permanent file archive, so feel free to create your bucket however you like. You’ll reference your new S3 bucket in the next step.

4. Test AWS C3 sync command

Let’s do a simple test to ensure that a) AWS CLI is correctly installed and configured and b) your AWS S3 bucket is accessible from the command line. We’ll create a small test file in your home folder, then call the AWS CLI “sync” command to copy it to your bucket.

cd ~
echo "hello world" > test.file
aws s3 sync test.file s3://[THE NAME OF YOUR BUCKET]

If successful then you’ll find a copy of your file inside your AWS S3 bucket. Otherwise, please leave comments describing your particular situation so that I can continue to improve this how-to article.

5. Test connectivity to MySQL

In this step we want to verify that you can connect to your MySQL server from the command line. If you are running an Open edX instance based on my article Open edX Step-By-Step Production Installation Guide then your MySQL server is running on your EC2 instance and is therefore accessible as “localhost”, which incidentally is the default host value.

Beginning with the Koa release the MySQL root password is set, and as of this publication I still have not determined how to find the password value that they have chosen. However, you still have two choices. You can use the ‘admin’ account that the native installer creates, and which has the same permissions as ‘root’, or, you can reset the MySQL root password by following these instructions.

To login using the built-in ‘admin’ user you’ll need to lookup the password value stored in /home/ubuntu/my-passwords.yml. Look for the key value ‘COMMON_MYSQL_ADMIN_PASS’ on or around row 14

The command-line login command is as follows. You’ll be prompted for the password after you hit the return key.

mysql -u admin -p

If successful you’ll see a screen substantially like the following. Type ‘exit’ and enter to logout of MySQL. Please leave me a comment if you have trouble logging in so that I can improve this article.

6. Test Connectivity to MongoDB

To connect to MongoDB you’ll need your admin password. The third step of the Open edX native build installation script creates a plain text file in your home folder named my-passwords.yml that contains a list of all unique password values that were automagically created for you during the installation. Your MongoDB admin password is located in this file on approximately row 39 and identified as MONGO_ADMIN_PASSWORD.

Next, we’ll create a small Bash script to test whether we can connect to MongoDB, as follows

#!/bin/bash
for db in edxapp cs_comment_service_development; do
    echo "Dumping Mongo db ${db}..."
    mongodump -u admin -p'ADD YOUR PASSWORD HERE' -h localhost --authenticationDatabase admin -d ${db} --out mongo-dump
done

Save this in a file named ~/mongo-test.sh, then run the following commands to execute your test

sudo chmod 755 ~/mongo-test.sh  #This makes the file executable
bash ~/mongo-test.sh   #execute the Bash script

If successful you’ll see output approximately like the following

7. Create Bash script

In this step we’ll bring everything together into a single Bash script that does the following

  • Creates a temporary working folder
  • Backs up schemas and data for each database that is part of the Open edX platform. Importantly, we want to exclude system database in order to avoid data restore problems
  • Backs up MongoDB course data and Discussion Forum data
  • Combines both sets of data into a tarball
  • Copies the tarball to our AWS S3 bucket
  • Prunes our AWS S3 bucket of old backup files
  • Cleans up and removes our temporary working folder

The complete script is available here on Github, or if you’re in a hurry just copy/paste the snippet below.
Credit for the substance of this script goes to Net Batchelder.

#!/bin/bash
#---------------------------------------------------------
# written by: lawrence mcdaniel
#             https://lawrencemcdaniel.com
#             https://blog.lawrencemcdaniel.com
#
# date:       feb-2018
# modified:   jan-2021: create separate tarballs for mysql/mongo data
#
# usage:      backup MySQL and MongoDB data stores
#             combine into a single tarball, store in "backups" folders in user directory
#
# reference:  https://github.com/edx/edx-documentation/blob/master/en_us/install_operations/source/platform_releases/ginkgo.rst
#---------------------------------------------------------

#------------------------------ SUPER IMPORTANT!!!!!!!! -- initialize these variables
MYSQL_PWD="COMMON_MYSQL_ADMIN_PASS FROM my-passwords.yml"     #Add your MySQL admin password, if one is set. Otherwise set to a null string
MONGODB_PWD="MONGO_ADMIN_PASSWORD FROM my-passwords.yml"      #Add your MongoDB admin password from your my-passwords.yml file in the ubuntu home folder.

S3_BUCKET="YOUR S3 BUCKET NAME"             # For this script to work you'll first need the following:
                                            # - create an AWS S3 Bucket
                                            # - create an AWS IAM user with programatic access and S3 Full Access privileges
                                            # - install AWS Command Line Tools in your Ubuntu EC2 instance
                                            # run aws configure to add your IAM key and secret token
#------------------------------------------------------------------------------------------------------------------------

BACKUPS_DIRECTORY="/home/ubuntu/backups/"
WORKING_DIRECTORY="/home/ubuntu/backup-tmp/"
NUMBER_OF_BACKUPS_TO_RETAIN="10"            # Note: this only regards local storage (ie on the ubuntu server). All backups are retained in the S3 bucket forever.

# Sanity check: is there an Open edX platform on this server?
if [ ! -d "/edx/app/edxapp/edx-platform/" ]; then
  echo "Didn't find an Open edX platform on this server. Exiting"
  exit
fi


#Check to see if a working folder exists. if not, create it.
if [ ! -d ${WORKING_DIRECTORY} ]; then
    mkdir ${WORKING_DIRECTORY}
    echo "created backup working folder ${WORKING_DIRECTORY}"
fi

#Check to see if anything is currently in the working folder. if so, delete it all.
if [ -f "$WORKING_DIRECTORY/*" ]; then
  sudo rm -r "$WORKING_DIRECTORY/*"
fi

#Check to see if a backups/ folder exists. if not, create it.
if [ ! -d ${BACKUPS_DIRECTORY} ]; then
    mkdir ${BACKUPS_DIRECTORY}
    echo "created backups folder ${BACKUPS_DIRECTORY}"
fi


cd ${WORKING_DIRECTORY}

# Begin Backup MySQL databases
#------------------------------------------------------------------------------------------------------------------------
echo "Backing up MySQL databases"
echo "Reading MySQL database names..."
mysql -uadmin -p"$MYSQL_PWD" -ANe "SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('mysql','sys','information_schema','performance_schema')" > /tmp/db.txt
DBS="--databases $(cat /tmp/db.txt)"
NOW="$(date +%Y%m%dT%H%M%S)"
SQL_FILE="mysql-data-${NOW}.sql"
echo "Dumping MySQL structures..."
mysqldump -uroot -p"$MYSQL_PWD" --add-drop-database ${DBS} > ${SQL_FILE}
echo "Done backing up MySQL"

#Tarball our mysql backup file
echo "Compressing MySQL backup into a single tarball archive"
tar -czf ${BACKUPS_DIRECTORY}openedx-mysql-${NOW}.tgz ${SQL_FILE}
sudo chown root ${BACKUPS_DIRECTORY}openedx-mysql-${NOW}.tgz
sudo chgrp root ${BACKUPS_DIRECTORY}openedx-mysql-${NOW}.tgz
echo "Created tarball of backup data openedx-mysql-${NOW}.tgz"
# End Backup MySQL databases
#------------------------------------------------------------------------------------------------------------------------


# Begin Backup Mongo
#------------------------------------------------------------------------------------------------------------------------
echo "Backing up MongoDB"
for db in edxapp cs_comment_service_development; do
    echo "Dumping Mongo db ${db}..."
    mongodump -u admin -p"$MONGODB_PWD" -h localhost --authenticationDatabase admin -d ${db} --out mongo-dump-${NOW}
done
echo "Done backing up MongoDB"

#Tarball all of our backup files
echo "Compressing backups into a single tarball archive"
tar -czf ${BACKUPS_DIRECTORY}openedx-mongo-${NOW}.tgz mongo-dump-${NOW}
sudo chown root ${BACKUPS_DIRECTORY}openedx-mongo-${NOW}.tgz
sudo chgrp root ${BACKUPS_DIRECTORY}openedx-mongo-${NOW}.tgz
echo "Created tarball of backup data openedx-mongo-${NOW}.tgz"
# End Backup Mongo
#------------------------------------------------------------------------------------------------------------------------


#Prune the Backups/ folder by eliminating all but the 30 most recent tarball files
echo "Pruning the local backup folder archive"
if [ -d ${BACKUPS_DIRECTORY} ]; then
  cd ${BACKUPS_DIRECTORY}
  ls -1tr | head -n -${NUMBER_OF_BACKUPS_TO_RETAIN} | xargs -d '\n' rm -f --
fi

#Remove the working folder
echo "Cleaning up"
sudo rm -r ${WORKING_DIRECTORY}

echo "Sync backup to AWS S3 backup folder"
/usr/local/bin/aws s3 sync ${BACKUPS_DIRECTORY} s3://${S3_BUCKET}/backups
echo "Done!"

8. Create a Cron job

Lets setup a cron job that executes our new backup script once a day at say, 6:00am GMT.

crontab -e

Then add this row to your cron table

# m h  dom mon dow   command
0 6 * * * sudo ./edx.backup.sh > edx.backup.out

And now you’re set! Your Open edX site is getting completely backed up every day and stored offset in an AWS S3 bucket.