Learn how to scale your Open edX platform by migrating the MySQL database service to its own remote Ubuntu server running on AWS EC2.

Summary

If you’re looking for a general overview of how to scale the Open edX platform then you should read this first, “Scaling Open edX“. If you’re planning a completely new Open edX installation in which you are not migrating data and that needs to run at scale, then you might want to take a look at the automated tools included in Cookiecutter Open edX Devops as well as this article on Managing Your Open edX Backend With Terraform.

The Open edX platform persists data across four distinct subsystems: MySQL, MongoDB, the Ubuntu file system, and Memcached. To scale the Open edX platform you first must physically separate the executable program code from the data that is managed by these four subsystems. This article explains how to migrate the MySQL data. By scaling, I mean migrating Open edX’s local MySQL service and data to its own independent Ubuntu EC2 instance.

Over the years I’ve experimented with various strategies for migrating Open edX’s MySQL data. This article describes my preferred approach, which involves using AWS EC2 console tools to clone your existing single-server installation of Open edX, and then to pare the clone down to only the MySQL system files and data. This approach is simple, fast and reliable. A key rationale for this approach is that the internal Open edX db schemas are both complex and fragile. Open edX maintains its user data in MySQL across multiple logical databases, each of which are accessed by multiple MySQL system user accounts of varying permissions levels. Some of this is taken care of automatically with MySQL’s native dump and restore commands, but a lot of it is not. But at any rate, working with a clone of your existing Open edX instance renders the point moot.

Regarding horizontal Vs vertical scaling strategies: note that you probably only need to focus on vertical scaling for MySQL on Open edX. Even though MySQL implements a publisher-subscriber model, I’ve found this to be complete overkill for all edX platforms on which I’ve ever worked. You can reasonably begin to scale MySQL on a remote t2.medium EC2 instance as further described below, increasing the size of the server over time on an as-needed basis.

I. Snapshot Your Open edX Server

1. Create an AMI from your Open edX instance

Follow the instructions in the AWS official documentation to create a clone (aka an “Amazon Machine Image, AMI”) of your existing Open edX instance, “Create an AMI from an Amazon EC2 Instance“. Keep in mind that you’re going to initialize your MySQL database from this AMI, so you need to ensure that no learners are interacting with your Open edX instance during the time that are migrating your MySQL data. You should conservatively estimate a 4-hour maintenance window for this activity. I generally perform this work during off hours.

II. Create a Remote MySQL Server

1. Launch a new EC2 instance from the AMI

Launch a new t2.medium EC2 instance from the AMI that you created. I re-use the existing SSH key that i used for the original Open edX EC2 instance. Note that you’d only need a different SSH key if for example, completely different teams manage the Open edX and MySQL environments.

2. Create a new EC2 Security Group for MySQL

You should create a separate EC2 Security Group for your new MySQL EC2 instance, as follows:

This firewall configuration limits remote access of the server to MySQL, regardless of whatever other services might still be installed and running internally on the server. Note that on the first row, SSH, you should try to limit access to your bastion server, if you use one.

3. Take Note of The Internal IP Address That is Assigned

You’ll access your new remote MySQL server via the internal IP address, which is automatically assigned by AWS when you create the new EC2 instance. Take note of this value, which will be titled, “Private IPv4 addresses”

4. Allow Remote MySQL Connections

Initially, you will not be able to connect to mysql remotely via the internal IP address because the native installation of Open edX disables remote MySQL database connections in order to improve security. To allow your Open edX instance to be able to connect to the new MySQL server remotely via port 3306 you’ll need to enable remote connections, which is a 2-step process, both of which are performed on your new MySQL remote server. First, on your new MySQL remote server, you need to modify the following configuration item located in /etc/mysql/mysql.conf.d/mysqld.cnf on or around row 43:

# Comment out the following item
bind-address           = 127.0.0.1
# by preceding 'bind-address' with a '#' character as follows
#bind-address           = 127.0.0.1

Modifications to mysqld.cnf require a restart of themysqld service

sudo service mysql restart

Second, you need to modify the existing MySQL user accounts. The native installation also limits the scope/visibility of all MySQL users to ‘localhost’, and so you’ll need to update the database table mysql.user to expand the allowed hosts from ‘localhosts’ to ‘%’ (which means ‘any’ host).

# Note: beginning with Koa, the root mysql password is preset
# and as of this writing i have not located where this password is set. As a workaround,
# you can alternatively use the command 
#            mysql -u admin -p
# where the password value for 'admin' is stored in /home/ubuntu/my-passwords.yml
# under the yml key tag, "COMMON_MYSQL_ADMIN_PASS" on or around row 22.
mysql -u root -p
mysql> UPDATE mysql.user SET host = '%';
mysql> FLUSH PRIVILEGES;

# NOTE: (February 2021) I had an isolated case where I lost all of the 
# user-level permissions after running this UPDATE query. If this happens 
# then you have two choices: 
# a.) use the mysql command line in your original EC2 instance to generate
#     a script of the user-level permissions, or
# b.) start the entire process from the beginning, and then use a tool like
#     MySQL Workbench (https://www.mysql.com/products/workbench/) to
#     manually update the host setting for each user.
# results should look like the following:
mysql> select host, user from mysql.user;
+------+------------------+
| host | user             |
+------+------------------+
| %    | admin            |
| %    | api001           |
| %    | debian-sys-maint |
| %    | discov001        |
| %    | ecomm001         |
| %    | edxapp001        |
| %    | edxapp_cmsh001   |
| %    | migrate          |
| %    | read_only        |
| %    | reports001       |
| %    | root             |
| %    | rosencrantz      |
| %    | xqueue001        |
+------+------------------+
13 rows in set (0.01 sec)
mysql> exit

More detailed instructions are available here, “MySQL: How to allow remote connection to mysql

At this point you should be able to connect to your new MySQL remote server from the command line of your Open edX instance by logging in to MySQL as follows:

# execute this command from the command line of your Open edX instance,
# substituting “172.x.x.x” with the internal IP address of your new MySQL server.
mysql -h 172.x.x.x -u root -p
5. Remove Superfluous services, system files, data

Now you need to shut down and un-install all other services that are currently running on your new remote MySQL server. This is easier that it may appear because the Open edX file system is very well organized and also because that are fewer major industrial-grade services running on the Open edX instance than may appear.

#-------------------------------------------------------
# execute these commands on your new remote MySQL server
#-------------------------------------------------------
# remove MondoDB from the new remote MySQL server
sudo service mongod stop
sudo systemctl disable mongod

# remove Nginx from the new remote MySQL server
sudo service nginx stop
sudo systemctl disable nginx
sudo apt-get purge nginx nginx-common

# remove any LetsEncrypt system files that may exist
sudo apt-get remove certbot python-certbot-nginx
sudo rm -r /etc/letsencrypt


# remove RabbitMQ from the new remote MySQL server 
sudo systemctl stop rabbitmq-server.service
sudo systemctl disable rabbitmq-server.service
sudo rabbitmqctl status
sudo apt-get remove rabbitmq-server
sudo systemctl disable rabbitmq-server
sudo systemctl stop rabbitmq-server

# remove ElasticSearch from the new remote MySQL server 
sudo systemctl stop elasticsearch
sudo systemctl disable elasticsearch.service
sudo apt-get --purge autoremove elasticsearch

# remove Memcached from the new remote MySQL server
sudo service memcached stop
sudo systemctl disable memcached
sudo apt-get -y remove memcached

# remove all Open edX application source files
sudo rm -r /edx/app

# Reboot the server
sudo reboot

III. Reconfigure Open edX to Connect to Your Remote MySQL Server

To identify which yml files require modification you can use the Linux command, ‘grep’ to search for the Linux port number corresponding to the service you are scaling. For example, MySQL is assigned to Linux port number 3306, thus we can execute the the following command to identify all Open edX configuration files that contain MySQL configuration parameters:

sudo grep -rn '3306' /edx/etc/

We can see that in total there are seven yml files containing MySQL configuration parameters, and furthermore that some of these yml files contain multiple references to MySQL, as follows:

/edx/etc/insights.yml:26:        PORT: 3306
/edx/etc/discovery.yml:26:        PORT: 3306
/edx/etc/discovery.yml:38:        PORT: 3306
/edx/etc/lms.yml:209:        PORT: '3306'
/edx/etc/lms.yml:218:        PORT: '3306'
/edx/etc/lms.yml:227:        PORT: '3306'
/edx/etc/studio.yml:167:        PORT: '3306'
/edx/etc/studio.yml:176:        PORT: '3306'
/edx/etc/studio.yml:185:        PORT: '3306'
/edx/etc/xqueue.yml:16:        PORT: '3306'
/edx/etc/analytics_api.yml:24:        PORT: '3306'
/edx/etc/analytics_api.yml:34:        PORT: '3306'
/edx/etc/ecommerce.yml:36:        PORT: 3306

For example, the MySQL configuration for lms.yml looks like the following:

DATABASES:
    default:
        ATOMIC_REQUESTS: true
        CONN_MAX_AGE: 0
        ENGINE: django.db.backends.mysql
        HOST: localhost
        NAME: edxapp
        OPTIONS: {}
        PASSWORD: vU2MEmYcj7cSp79hZGUCWav6FbQT3W1v6kS
        PORT: '3306'
        USER: edxapp001
    read_replica:
        CONN_MAX_AGE: 0
        ENGINE: django.db.backends.mysql
        HOST: localhost
        NAME: edxapp
        OPTIONS: {}
        PASSWORD: vU2MEmYcj7cSp79hZGUCWav6FbQT3W1v6kS
        PORT: '3306'
        USER: edxapp001
    student_module_history:
        CONN_MAX_AGE: 0
        ENGINE: django.db.backends.mysql
        HOST: localhost
        NAME: edxapp_csmh
        OPTIONS: {}
        PASSWORD: igSQb0Wxfrnf6FYXso9NuEfj9y7eNgtzpOK
        PORT: '3306'
        USER: edxapp_cmsh001

To reconfigure the LMS to use a remote MySQL server we would modify the ‘HOST’ parameter from ‘localhost’ to the Internal IP address of the newly-created remote MySQL server. Note that we MUST address remote servers using internal IP addresses because otherwise our network traffic would leave and re-enter our Virtual Private Cloud which would not only be inefficient from a performance point of view but also insecure.

Do a full reboot of your Ubuntu server when you finish.

sudo reboot

Ok, that’s it for MySQL. If you can login to your Open edX instance as a normal user, and also login to the Django admin console then your migration was successful.

IV. Test Your Open edX Platform

Testing your platform is easier than it might seem.

  • If you can login to the LMS then the remote mysql service is correctly configured for this application.
  • If you can see course cards in the “Explore New Courses” page of the LMS then the remote mysql service is correctly configured for the Discovery service.
  • If you can navigate to Course Management Studio and browse the course list on the home page then the mysql service is correctly configured for this application
  • If you use the E-commerce service and you can browse the page https://[yourdomain.com]:18130/courses/ then the remote mysql service is correctly configured for this subsystem.

V. Disable the Local MySQL Service and Data Backup

On the original Open edX we’re only going to shut down the mysql service and prevent it from launching after reboots. We’ll leave the system files and the legacy data in place since these (probably) do not consume significant file system storage space.

# Execute these commands on your Open edX instance.
# terminate the mysql service.
sudo service mysql stop
# prevent mysql from automatically starting after reboot.
sudo systemctl disable mysql
# reboot the server
sudo reboot

If you have a cron job for scheduled data backups then disable this on your Open edX instance via the following command.

cron -e

Run the following command if you’re curious and want to know the exact size of the legacy mysql data files that remain on your Open edX instance

sudo du -sh /var/lib/mysql

V. Reconfigure MySQL Data Backup

If you followed my blog post, “Open edX Complete Backup Solution“, then the only thing you’ll have to do is edit the bash script edx.backup.sh to remove the MongoDB backup. The AWS CLI should work on your new remote MySQL server, assuming that you’d configured this for remote backups to AWS S3 on your original Open edX platform.