Learn the principals of scaling the Open edX Learning Management System, growing from a single-server evaluation-only installation to a production always-up instance with thousands of concurrent users.

Summary

Updated April 14, 2022 for the Open edX Maple release.

This article provides a common-sense road map for scaling your Open edX platform while minimizing costs and complexity. You can follow the links in this article to my detailed how-to technical tutorials for scaling each of the major steps I outline here.

Tutor installs and deploys the entire Open edX platform as Docker containers running on a single AWS EC2 instance, which is fine for projects intended for a few hundred active users. If you expect more than that however, or if you anticipate circumstances that will result in a high percentage of concurrent users such as midterm exams or a common project for example, then you might need to scale your platform. If you are hosting your platform on Amazon Web Services (AWS) then you definitely have all of the tools you’ll need to scale your platform to any demand scenario.

Note

I recently open-sourced the automation tools that I use for scaling Tutor installations, which you’ll find here, “Cookiecutter Open edX Devops“. You can use this Cookiecutter to build your own set of fully automated AWS infrastructure service tools, completely documented and pre-configured to follow all of the recommendations that you find in this collection of articles on scaling Open edX.

The diagram below presents a helicopter view of a long-term scaling strategy that you can phase in over time, and which is only marginally more complex to manage than the original single-server implementation.  This strategy features horizontally-scaled app servers while isolating each of the largest data subsystems into dedicated vertically-scaled environments. Maintaining the platform is only marginally more complex than that of a single-server installation: the native Ansible edx-platform upgrade utilities still work but they’ll require that you provide an additional setup file named server-vars.yml.

The diagram is mostly self-explanatory, but I’ll add a few points. First, you are limited to using only one application server until you have migrated all four types of data from your existing single-server environment. That is, you’ll first need to migrate MySQL, Mongo, Memcached, and instructor data that is stored on the Ubuntu file system such as pdf documents and images. You should not underestimate the scale of each of these tasks individually. If you’re just getting started with scaling then be aware that you have a lot of work ahead of you before you can realistically begin to think about application load balancing and horizontal scaling.

Second, notice that I place all EC2 instances in the same AWS Availability Zone, which might be slightly controversial. My rationale for this is that the platform depends on all of these servers being available at any given point in time. Thus, there’s no tangible benefit to placing any server outside of whatever availability zone you’ve chosen as your “common” availability zone. Meanwhile, I worry about LAN performance between servers in any cases where they are not running on the same backbone as would be the case if you straddled availability zones.

Third, regarding security, I should point out that each scaled service (MySQL, MongoDB, Memcached) uses a custom EC2 Security Group, and that none of the production servers are accessible from outside the Virtual Private Cloud (VPC). The app servers only receive traffic from an AWS Elastic Load Balancer (ELB) and all other servers only interact with other servers in the VPC. I never open MySQL (port 3306) to the public, nor MongoDB (port 27017). To work with MySQL I mostly use the mysql command line tools or MySQL Workbench since this desktop application provides a convenient way to connect to a remote MySQL server via a bastion. For the same reason, I use CyberDuck to manage files between my development environment and any of the production servers behind the firewall.

And finally, some miscellaneous comments about the ancillary services that I use:

  • Elastic IP Addresses: I only use these for the bastion and the app server in cases where I’m not using a load balancer.
  • Route53: There are huge convenience advantages to using AWS Route53 DNS server as an alternative to whatever your domain registrar offers.
  • Certificate Manager: I use this for TLS/SSL certs for the Load Balancer and for the Cloudfront (CDN) end point cert. The certificates are free, plus AWS automatically renews them for you, so these are preferable to other alternatives like LetsEncrypt.
  • Simple Email Service: This is a fantastic and cheap alternative to setting up or paying for a full-featured SMTP email server.
  • Identity Access Management: I use this to create the key/secret pairs to provide to Open edX lms.yml and studio.yml for enabling AWS S3, and also for the backup scripts from my article, “Open edX Complete Backup Solution“.

Implementing a Kubernetes cluster

Step-by-step Tutorial

Running Open edX on Kubernetes

The rapid adoption of Tutor as a build and deployment solution for Open edX has also opened the door to using Kubernetes. But what is Kubernetes and what problem does it solve for an Open edX platform? Well, plenty! Read on to see what, why, and how you can begin leveraging this next-generation system management technology.

Scaling MySQL

MySQL is a logical starting point for scaling because it’s notorious for consuming Ub