Open edX Complete Backup Solution

Learn how to create a complete backup solution for your Open edX installation. This detailed step-by-step how-to guide covers backing up MySQL and MongoDB, organizing backup data into a single date-stamped tarball zip file, plus how to setup a cron job and how to copy your backups to an AWS S3 storage bucket.

Summary

The official Open edX documentation takes a laissez faire approach to many aspects of administration and support, including for example, how to properly backup and restore course and user data. This article attempts to fill that void. Also, don’t forget to bookmark my post on how to restore your Open edX platform from a backup. Implementing an effective backup solution for Open edX requires proficiency in a number of technologies, which is fine if you’re part of a full IT team at a major university, but can this can otherwise be a real obstacle to competently supporting your Open edX platform.

Open edX stores course data, including media uploads such as images and mp4 video files in MongoDB. To do this, MongoDB’s core functionality is extended with a technology called GridFS that provides an infinitely scalable file system for all course-related data. For student data Open edX takes a more relational approach with MySQL, noting however that the Open edX platform relies on several databases (see right-hand diagram). These are excellent architectural choices and both technologies are best-of-breed and getting better all the time. Nonetheless, having two entirely different persisting strategies under the hood really complicates simple IT management responsibilities like data backups.

We’re going to setup an automated daily backup procedure that backs up the complete contents of the MongoDB course database (including file/document, media and image uploads) and each individual MySQL database that contains learner user data. We’ll create a Bash script that combines these files into a single date-stamped linux tarball and then pushes this to an AWS S3 bucket for long-term remote storage.

You may also be interested in my recent posts, “Scaling Open edX” and “Open edX Configuration Tutorial“.

Implementation Steps

Create AWS command line authentication credentials using AWS Identity Access Management
Install AWS’ command line interface (CLI)
Create an AWS S3 Bucket
Test AWS S3 sync
Test connectivity to MySQL
Test connectivity to MongoDB
Create a Bash script to backup databases
Create a cron job

Assumptions

Your Open edX instance is running from an AWS account
Your AWS EC2 instance is running on an Ubuntu 20.04 LTS server built from the Amazon Linux AMI
You have SSH access to your EC2 instance and sudo capability
You have permissions to create AWS IAM users and S3 resources
Your Open edX instance is substantially based on the guidelines published here: Open edX Step-By-Step Production Installation Guide

1. Create credentials for AWS CLI

AWS provides an integrated security management system called Identity and Access Management (IAM) that we are going to use in combination with AWS’ Command Line Interface (CLI) to give us a way to copy files from our local Ubuntu file system to an S3 bucket in our AWS account. If you’re new to either topic then it behooves you to follow these two links to learn the basics of both topics.

In this step, we’ll create a new IAM user with full access to S3 resources.

In the next step we’ll permanently store our “Access Key ID” and “Secret key ID” values in the AWS CLI configuration file on our Ubuntu server instance so that we’ll be able to seamlessly make calls to AWS S3 from the command line, and importantly, from a Bash script.

2. Install AWS CLI (Command Line Interface)

Next, lets install AWS’ command line tools. The version of Ubuntu that you should be using has Pip preloaded. We’ll use this to install AWS CLI.

$ pip install awscli --upgrade --user

after installing run the AWS “configure” program to permanently store your IAM user credentials

$ aws configure

The “configure” program adds a hidden directory to your home folder named .aws that contains a file named “config” and another named “credentials”; both of which you can edit. So just fyi, you can change your configuration values in the future either by re-running aws configure or by editing these two files.

If you need to troubleshoot, or heck, if you just want to know more then you can read AWS’ CLI installation instructions.

3. Create An AWS S3 Bucket

Amazon S3 is object storage built to store and retrieve any amount of data from anywhere – web sites and mobile apps, corporate applications, and data from IoT sensors or devices. It is designed to deliver 99.999999999% durability, and stores data for millions of applications used by market leaders in every industry. We’ll use S3 to archive our backup files. In my view S3 is a very stripped down file system that has been highly optimized for its specific use case.

Creating an S3 bucket is straightforward, and AWS’ step-by-step instructions are very good. There are no special configuration requirements for our use case as a permanent file archive, so feel free to create your bucket however you like. You’ll reference your new S3 bucket in the next step.

4. Test AWS C3 sync command

Let’s do a simple test to ensure that a) AWS CLI is correctly installed and configured and b) your AWS S3 bucket is accessible from the command line. We’ll create a small test file in your home folder, then call the AWS CLI “sync” command to copy it to your bucket.

cd ~
echo "hello world" > test.file
aws s3 sync test.file s3://[THE NAME OF YOUR BUCKET]

If successful then you’ll find a copy of your file inside your AWS S3 bucket. Otherwise, please leave comments describing your particular situation so that I can continue to improve this how-to article.

5. Test connectivity to MySQL

In this step we want to verify that you can connect to your MySQL server from the command line. If you are running an Open edX instance based on my article Open edX Step-By-Step Production Installation Guide then your MySQL server is running on your EC2 instance and is therefore accessible as “localhost”, which incidentally is the default host value.

Beginning with the Koa release the MySQL root password is set, and as of this publication I still have not determined how to find the password value that they have chosen. However, you still have two choices. You can use the ‘admin’ account that the native installer creates, and which has the same permissions as ‘root’, or, you can reset the MySQL root password by following these instructions.

To login using the built-in ‘admin’ user you’ll need to lookup the password value stored in /home/ubuntu/my-passwords.yml. Look for the key value ‘COMMON_MYSQL_ADMIN_PASS’ on or around row 14

The command-line login command is as follows. You’ll be prompted for the password after you hit the return key.

mysql -u admin -p

If successful you’ll see a screen substantially like the following. Type ‘exit’ and enter to logout of MySQL. Please leave me a comment if you have trouble logging in so that I can improve this article.

6. Test Connectivity to MongoDB

To connect to MongoDB you’ll need your admin password. The third step of the Open edX native build installation script creates a plain text file in your home folder named my-passwords.yml that contains a list of all unique password values that were automagically created for you during the installation. Your MongoDB admin password is located in this file on approximately row 39 and identified as MONGO_ADMIN_PASSWORD.

Next, we’ll create a small Bash script to test whether we can connect to MongoDB, as follows

#!/bin/bash
for db in edxapp cs_comment_service_development; do
    echo "Dumping Mongo db ${db}..."
    mongodump -u admin -p'ADD YOUR PASSWORD HERE' -h localhost --authenticationDatabase admin -d ${db} --out mongo-dump
done

Save this in a file named ~/mongo-test.sh, then run the following commands to execute your test

sudo chmod 755 ~/mongo-test.sh  #This makes the file executable
bash ~/mongo-test.sh   #execute the Bash script

If successful you’ll see output approximately like the following

7. Create Bash script

In this step we’ll bring everything together into a single Bash script that does the following

Creates a temporary working folder
Backs up schemas and data for each database that is part of the Open edX platform. Importantly, we want to exclude system database in order to avoid data restore problems
Backs up MongoDB course data and Discussion Forum data
Combines both sets of data into a tarball
Copies the tarball to our AWS S3 bucket
Prunes our AWS S3 bucket of old backup files
Cleans up and removes our temporary working folder

The complete script is available here on Github, or if you’re in a hurry just copy/paste the snippet below.
Credit for the substance of this script goes to Net Batchelder.

#!/bin/bash
#---------------------------------------------------------
# written by: lawrence mcdaniel
#             https://lawrencemcdaniel.com
#             https://blog.lawrencemcdaniel.com
#
# date:       feb-2018
# modified:   jan-2021: create separate tarballs for mysql/mongo data
#
# usage:      backup MySQL and MongoDB data stores
#             combine into a single tarball, store in "backups" folders in user directory
#
# reference:  https://github.com/edx/edx-documentation/blob/master/en_us/install_operations/source/platform_releases/ginkgo.rst
#---------------------------------------------------------

#------------------------------ SUPER IMPORTANT!!!!!!!! -- initialize these variables
MYSQL_PWD="COMMON_MYSQL_ADMIN_PASS FROM my-passwords.yml"     #Add your MySQL admin password, if one is set. Otherwise set to a null string
MONGODB_PWD="MONGO_ADMIN_PASSWORD FROM my-passwords.yml"      #Add your MongoDB admin password from your my-passwords.yml file in the ubuntu home folder.

S3_BUCKET="YOUR S3 BUCKET NAME"             # For this script to work you'll first need the following:
                                            # - create an AWS S3 Bucket
                                            # - create an AWS IAM user with programatic access and S3 Full Access privileges
                                            # - install AWS Command Line Tools in your Ubuntu EC2 instance
                                            # run aws configure to add your IAM key and secret token
#------------------------------------------------------------------------------------------------------------------------

BACKUPS_DIRECTORY="/home/ubuntu/backups/"
WORKING_DIRECTORY="/home/ubuntu/backup-tmp/"
NUMBER_OF_BACKUPS_TO_RETAIN="10"            # Note: this only regards local storage (ie on the ubuntu server). All backups are retained in the S3 bucket forever.

# Sanity check: is there an Open edX platform on this server?
if [ ! -d "/edx/app/edxapp/edx-platform/" ]; then
  echo "Didn't find an Open edX platform on this server. Exiting"
  exit
fi


#Check to see if a working folder exists. if not, create it.
if [ ! -d ${WORKING_DIRECTORY} ]; then
    mkdir ${WORKING_DIRECTORY}
    echo "created backup working folder ${WORKING_DIRECTORY}"
fi

#Check to see if anything is currently in the working folder. if so, delete it all.
if [ -f "$WORKING_DIRECTORY/*" ]; then
  sudo rm -r "$WORKING_DIRECTORY/*"
fi

#Check to see if a backups/ folder exists. if not, create it.
if [ ! -d ${BACKUPS_DIRECTORY} ]; then
    mkdir ${BACKUPS_DIRECTORY}
    echo "created backups folder ${BACKUPS_DIRECTORY}"
fi


cd ${WORKING_DIRECTORY}

# Begin Backup MySQL databases
#------------------------------------------------------------------------------------------------------------------------
echo "Backing up MySQL databases"
echo "Reading MySQL database names..."
mysql -uadmin -p"$MYSQL_PWD" -ANe "SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('mysql','sys','information_schema','performance_schema')" > /tmp/db.txt
DBS="--databases $(cat /tmp/db.txt)"
NOW="$(date +%Y%m%dT%H%M%S)"
SQL_FILE="mysql-data-${NOW}.sql"
echo "Dumping MySQL structures..."
mysqldump -uroot -p"$MYSQL_PWD" --add-drop-database ${DBS} > ${SQL_FILE}
echo "Done backing up MySQL"

#Tarball our mysql backup file
echo "Compressing MySQL backup into a single tarball archive"
tar -czf ${BACKUPS_DIRECTORY}openedx-mysql-${NOW}.tgz ${SQL_FILE}
sudo chown root ${BACKUPS_DIRECTORY}openedx-mysql-${NOW}.tgz
sudo chgrp root ${BACKUPS_DIRECTORY}openedx-mysql-${NOW}.tgz
echo "Created tarball of backup data openedx-mysql-${NOW}.tgz"
# End Backup MySQL databases
#------------------------------------------------------------------------------------------------------------------------


# Begin Backup Mongo
#------------------------------------------------------------------------------------------------------------------------
echo "Backing up MongoDB"
for db in edxapp cs_comment_service_development; do
    echo "Dumping Mongo db ${db}..."
    mongodump -u admin -p"$MONGODB_PWD" -h localhost --authenticationDatabase admin -d ${db} --out mongo-dump-${NOW}
done
echo "Done backing up MongoDB"

#Tarball all of our backup files
echo "Compressing backups into a single tarball archive"
tar -czf ${BACKUPS_DIRECTORY}openedx-mongo-${NOW}.tgz mongo-dump-${NOW}
sudo chown root ${BACKUPS_DIRECTORY}openedx-mongo-${NOW}.tgz
sudo chgrp root ${BACKUPS_DIRECTORY}openedx-mongo-${NOW}.tgz
echo "Created tarball of backup data openedx-mongo-${NOW}.tgz"
# End Backup Mongo
#------------------------------------------------------------------------------------------------------------------------


#Prune the Backups/ folder by eliminating all but the 30 most recent tarball files
echo "Pruning the local backup folder archive"
if [ -d ${BACKUPS_DIRECTORY} ]; then
  cd ${BACKUPS_DIRECTORY}
  ls -1tr | head -n -${NUMBER_OF_BACKUPS_TO_RETAIN} | xargs -d '\n' rm -f --
fi

#Remove the working folder
echo "Cleaning up"
sudo rm -r ${WORKING_DIRECTORY}

echo "Sync backup to AWS S3 backup folder"
/usr/local/bin/aws s3 sync ${BACKUPS_DIRECTORY} s3://${S3_BUCKET}/backups
echo "Done!"

8. Create a Cron job

Lets setup a cron job that executes our new backup script once a day at say, 6:00am GMT.

crontab -e

Then add this row to your cron table

# m h  dom mon dow   command
0 6 * * * sudo ./edx.backup.sh > edx.backup.out

And now you’re set! Your Open edX site is getting completely backed up every day and stored offset in an AWS S3 bucket.

By Lawrence McDaniel|2021-01-31T09:03:45-06:00January 14th, 2021|Categories: AWS, Dev Ops, Open edX|19 Comments

About the Author: Lawrence McDaniel

Lawrence is a full stack developer, a Youtube content creator, a lecturer at the University of British Columbia, and a freelance Open edX Consultant / Open edX Service Provider specializing in Python, Django, ReactJS, Wordpress and Amazon Web Services. He lives in Mexico City. More info here: lawrencemcdaniel.com

ChatGPT Full Stack Using React + AWS Serverless

October 17th, 2023 | 1 Comment

The New Github CoPilot

October 5th, 2023 | 0 Comments

OpenAI API With AWS Lambda

September 14th, 2023 | 4 Comments

WordPress oAuth Provider for Open edX

August 1st, 2023 | 0 Comments

Production-Ready AWS Elastic Kubernetes Service

July 30th, 2023 | 0 Comments

The Open edX Example Plugin

May 31st, 2023 | 1 Comment

19 Comments

Tom February 13, 2023 at 9:26 am - Reply

Hi Lawrenece! I was wondering if you had an update solution for a Tutor based install?

Thanks!
- Lawrence McDaniel April 11, 2023 at 7:17 pm - Reply
  
  take a look at this project: https://github.com/cookiecutter-openedx
Krillin August 17, 2020 at 11:01 am - Reply

Hi Lawrence,

In the mongo db backup the “cs_comment_service_development” should be “cs_comment_service”

Enjoy your day 🙂

Bye
- Kratik Jain May 6, 2021 at 2:46 am - Reply
  
  I think it should be *cs_comments_service* 🙂
eva January 22, 2019 at 5:13 am - Reply

Hi Lawrence,
Just wondering if the ‘Sync backup to AWS S3 backup folder’ bit is important?. I do not have a AWS S3 bucket account and wanted to upgrade my OpenedX instance. Can I still be able to follow your guide?
- admin January 22, 2019 at 6:28 am - Reply
  
  hi eva, yes, the backup program will still work without this line. however, the AWS S3 Sync copies your backup files to a remote file storage service, preserving them in the event that something bad happens to your open edx server. AWS S3 is a free and highly reliable service.
  - eva January 23, 2019 at 12:59 am - Reply
    
    Thank you Lawrence, I have changed that bit to backup to a git repository. it works!. I would also like to know why you decided to prune the backups? is it to speed up the backup process or is there any other reason?
    - admin January 23, 2019 at 6:48 am - Reply
      
      the backup files are pruned locally to ensure that the folder doesn’t eventually consume all of your available file system space. however, the backup files are stored permanently on AWS S3. thus, the set of files that remain on the server are strictly there for convenience because you can alternatively download any of your backup files from AWS S3.
Mattijs January 21, 2019 at 4:48 am - Reply

Hi Lawrence, thank you very much!

Personally I’m thinking of also adding files like lms.env.json and theming directories to the script.

Could you help me with a question, please?
If an attacker got root access to the server, could they use the backup script or the AWS CLI to delete the backups?
If so, what would be the best way to prevent this? I’m thinking I could change the AWS CLI policies or maybe just backup the backups to something other than a bucket, like Glacier. What do you think?

Thanks again for a great article!
- admin January 22, 2019 at 6:34 am - Reply
  
  hi Mattijs, yes, they potentially could get access depending on how you configure your S3 bucket. however, you can control access to your S3 storage bucket by using AWS’ IAM (Identity Access Management) service, which you can configure to prevent public access to the bucket. following are detailed instructions on how to do this: https://docs.aws.amazon.com/AmazonS3/latest/dev/walkthrough1.html
Nicola November 14, 2018 at 2:35 am - Reply

@admin:
Thanks a lot for your hard work only a small typo on the script:
it says
#Prune the Backups/ folder by eliminating all but the 30 most recent tarball files
it says 30, but only 15 are kept

this following line should read like this:
ls -1tr | head -n -30 | xargs -d ‘\n’ rm -f —
- admin November 14, 2018 at 7:46 am - Reply
  
  thank you for pointing that out! now corrected 🙂
Aung Win Thant October 3, 2018 at 3:04 am - Reply

Hi Lawrence,
I think sync is for folder and cp is for files. If you want to sync aws s3 sync test.file s3://[THE NAME OF YOUR BUCKET], it failed because test.file is not a folder. I think the correct command would be aws s3 sync /path/to/file/ s3://[bucket-name]. Or if you want to upload with file name, the command would be aws s3 cp /path/to/file/test.file s3://[bucket-name]. Thanks you very much for the article. It helps me alot.
- admin October 3, 2018 at 6:56 am - Reply
  
  hi Aung, you should sync the folder to S3. this ensures that you are archiving all daily backups for the time window that you specified in the script parameter (30 days by default).
Okan Binli September 21, 2018 at 1:47 am - Reply

Thanks for your post its helped me a lot.
Eric Agyapong Mensah August 9, 2018 at 10:32 pm - Reply

Hi Lawrence,
“sync” did not work for me so I tried “cp” and it worked. Was there something I did wrong or the command has changed?
- admin August 10, 2018 at 6:56 pm - Reply
  
  Hi eric, interesting. can you please post a screen shot of the error message that you receive?
  - Spurgeon November 12, 2018 at 9:38 pm - Reply
    
    Hi Lawrence, sync is not working correctly.
    It gives the error
    
    warning: Skipping file /home/ubuntu/test.file/. File does not exist.
    - admin November 13, 2018 at 6:04 am - Reply
      
      others have recently posted similar experiences in the GitHub repo for aws-cli: https://github.com/aws/aws-cli/issues/3514. it looks like it might be a bug in the current version. i’d follow their repo for awhile until the issue is resolved.

Mobile:	+1 (617) 834-6172‬
Email:	lpm0073@gmail.com
Portfolio:	lawrencemcdaniel.com
Address:	Cambridge, MA

Blog

Open edX Complete Backup Solution

Summary

Assumptions

1. Create credentials for AWS CLI

2. Install AWS CLI (Command Line Interface)

3. Create An AWS S3 Bucket

4. Test AWS C3 sync command

5. Test connectivity to MySQL

6. Test Connectivity to MongoDB

7. Create Bash script

8. Create a Cron job

About the Author: Lawrence McDaniel

19 Comments

Leave A Comment Cancel reply

Professional Services

Open edX Complete Backup Solution

Summary

Assumptions

1. Create credentials for AWS CLI

2. Install AWS CLI (Command Line Interface)

3. Create An AWS S3 Bucket

4. Test AWS C3 sync command

5. Test connectivity to MySQL

6. Test Connectivity to MongoDB

7. Create Bash script

8. Create a Cron job

Share This Article

About the Author: Lawrence McDaniel

Related Posts

ChatGPT Full Stack Using React + AWS Serverless

The New Github CoPilot

OpenAI API With AWS Lambda

WordPress oAuth Provider for Open edX

Production-Ready AWS Elastic Kubernetes Service

The Open edX Example Plugin

19 Comments

Leave A Comment Cancel reply