Learn how to scale your Open edX platform by migrating the MongoDB database service to its own remote Ubuntu server running on AWS EC2.
If you’re looking for a general overview of how to scale the Open edX platform then you should read this first, “Scaling Open edX“. If you’re planning a completely new Open edX installation in which you are not migrating data and that needs to run at scale, then you might want to take a look at the automated tools included in Cookiecutter Open edX Devops as well as this article on Managing Your Open edX Backend With Terraform.
The Open edX platform persists data across four distinct subsystems: MySQL, MongoDB, the Ubuntu file system, and Memcached. To scale the Open edX platform you first must physically separate the executable program code from the data that is managed by these four subsystems. This article explains how to migrate the MongoDB data. By scaling, I mean migrating Open edX’s local MongoDB service and data to its own independent Ubuntu EC2 instance.
Over the years I’ve experimented with various strategies for migrating Open edX’s MongoDB data. This article describes my preferred approach, which involves using AWS EC2 console tools to clone your existing single-server installation of Open edX, and then to pare the clone down to only the MongoDB system files and data. This approach is simple, fast and reliable. Open edX maintains its course data in MongoDB across multiple logical databases, each of which are accessed by multiple MongoDB system user accounts of varying permissions levels. Some of this is taken care of automatically with MongoDB’s native dump and restore commands, but a lot of it is not. But at any rate, working with a clone of your existing Open edX instance renders the point moot.
Regarding horizontal Vs vertical scaling strategies: note that you probably only need to focus on vertical scaling for MongoDB on Open edX. Even though MongoDB implements a robust sharding model, I’ve found this to be complete overkill for all edX platforms on which I’ve ever worked. You can reasonably begin to scale MongoDB on a remote t2.medium EC2 instance as further described below, increasing the size of the server over time on an as-needed basis.
I. Snapshot Your Open edX Server
1. Create an AMI from your Open edX instance
Follow the instructions in the AWS official documentation to create a clone (aka an “Amazon Machine Image, AMI”) of your existing Open edX instance, “Create an AMI from an Amazon EC2 Instance“. Keep in mind that you’re going to initialize your MongoDB database from this AMI, so you need to ensure that no instructors are modifying course data in Course Management Studio during the time that are migrating your Mongo data. You should conservatively estimate a 4-hour maintenance window for this activity. I generally perform this work during off hours.
II. Create a Remote MongoDB Server
1. Launch a new EC2 instance from the AMI
Launch a new t2.medium EC2 instance from the AMI that you created. I re-use the existing SSH key that i used for the original Open edX EC2 instance. Note that you’d only need a different SSH key if for example, completely different teams manage the Open edX and MongoDB environments.
2. Create a new EC2 Security Group for MongoDB
You should create a separate EC2 Security Group for your new MongoDB EC2 instance, as follows:
This firewall configuration limits remote access of the server to MongoDB, regardless of whatever other services might still be installed and running internally on the server. Note that on the first row, SSH, you should try to limit access to your bastion server, if you use one.
3. Take Note of The Internal IP Address That is Assigned
You’ll access your new remote MongoDB server via the internal IP address, which is automatically assigned by AWS when you create the new EC2 instance. Take note of this value, which will be titled, “Private IPv4 addresses”
4. Allow Remote MongoDB Connections
Initially, you will not be able to connect to mongo remotely via the internal IP address because the native installation of Open edX disables remote MongoDB database connections in order to improve security. To allow your Open edX instance to be able to connect to the new MongoDB server remotely via port 27017 you’ll need to enable remote connections by modify the file /etc/mongod.conf on your new remote MongoDB server:
# Modify the 'bindIp' configuration parameter as follows net: bindIp: 127.0.0.1 port: 27017 # changing the value "127.0.0.1" to "0.0.0.0" # this changes the allowed hosts from only "localhost" to any IP address. # Note that for even better security you can use the IP addresses of your # bastion server and your edxapp EC2 instance (or ELB) net: bindIp 0.0.0.0 port: 27017
Modifications to /etc/mongod.conf require a restart of the mongod service
sudo service mongod restart
More detailed instructions are available here, “How To Configure Remote Access for MongoDB on Ubuntu 20.04”
At this point you should be able to connect to your new MongoDB remote server from the command line of your Open edX instance by logging in to MongoDB as follows:
# execute this command from the command line of your Open edX instance, # substituting “172.x.x.x” with the internal IP address of your new MongoDB server. mongo --port 27017 --host 172.x.x.x -u "admin" -p "the-password-from-my-passwords.yml" --authenticationDatabase "admin"
5. Remove Superfluous services, system files, data
Now you need to shut down and un-install all other services that are currently running on your new remote MongoDB server. This is easier that it may appear because the Open edX file system is very well organized and also because that are fewer major industrial-grade services running on the Open edX instance than may appear.
#------------------------------------------------------- # execute these commands on your new remote MongoDB server #------------------------------------------------------- # remove MySQL from the new remote MongoDB server sudo service mysql stop sudo systemctl disable mysql # remove Nginx from the new remote MongoDB server sudo service nginx stop sudo systemctl disable nginx sudo apt-get purge nginx nginx-common # remove any LetsEncrypt system files that may exist sudo apt-get remove certbot python-certbot-nginx sudo rm -r /etc/letsencrypt # remove RabbitMQ from the new remote MongoDB server sudo systemctl stop rabbitmq-server.service sudo systemctl disable rabbitmq-server.service sudo rabbitmqctl status sudo apt-get remove rabbitmq-server sudo systemctl disable rabbitmq-server sudo systemctl stop rabbitmq-server # remove ElasticSearch from the new remote MongoDB server sudo systemctl stop elasticsearch sudo systemctl disable elasticsearch.service sudo apt-get --purge autoremove elasticsearch # remove Memcached from the new remote MongoDB server service memcached stop sudo systemctl disable memcached sudo apt-get -y remove memcached # remove all Open edX application source files sudo rm -r /edx/app
# reboot the server