This article provides a common-sense road map for scaling your Open edX platform while minimizing costs and complexity. You can follow the links in this article to my detailed how-to technical tutorials for scaling each of the major steps I outline here.
The methodology in my article, “Open edX Step-By-Step Production Installation Guide” installs the entire Open edX platform on a single AWS EC2 instance, which is fine for projects intended for a few hundred active users. If you expect more than that however, or if you anticipate circumstances that will result in a high percentage of concurrent users such as midterm exams or a common project for example, then you might need to scale your platform. If you are hosting your platform on Amazon Web Services (AWS) then you definitely have all of the tools you’ll need to scale your platform to any demand scenario.
The diagram below presents a helicopter view of a long-term scaling strategy that you can phase in over time, and which is only marginally more complex to manage than the original single-server implementation. This strategy features horizontally-scaled app servers while isolating each of the largest data subsystems into dedicated vertically-scaled environments. Maintaining the platform is only marginally more complex than that of a single-server installation: the native Ansible edx-platform upgrade utilities still work but they’ll require that you provide an additional setup file named server-vars.yml.
The diagram is mostly self-explanatory, but I’ll add a few points. First, you are limited to using only one application server until you have migrated all four types of data from your existing single-server environment. That is, you’ll first need to migrate MySQL, Mongo, Memcached, and instructor data that is stored on the Ubuntu file system such as pdf documents and images. You should not underestimate the scale of each of these tasks individually. If you’re just getting started with scaling then be aware that you have a lot of work ahead of you before you can realistically begin to think about application load balancing and horizontal scaling.
Second, notice that I place all EC2 instances in the same AWS Availability Zone, which might be slightly controversial. My rationale for this is that the platform depends on all of these servers being available at any given point in time. Thus, there’s no tangible benefit to placing any server outside of whatever availability zone you’ve chosen as your “common” availability zone. Meanwhile, I worry about LAN performance between servers in any cases where they are not running on the same backbone as would be the case if you straddled availability zones.
Third, regarding security, I should point out that each scaled service (MySQL, MongoDB, Memcached) uses a custom EC2 Security Group, and that none of the production servers are accessible from outside the Virtual Private Cloud (VPC). The app servers only receive traffic from an AWS Elastic Load Balancer (ELB) and all other servers only interact with other servers in the VPC. I never open MySQL (port 3306) to the public, nor MongoDB (port 27017). To work with MySQL I mostly use the mysql command line tools or MySQL Workbench since this desktop application provides a convenient way to connect to a remote MySQL server via a bastion. For the same reason, I use CyberDuck to manage files between my development environment and any of the production servers behind the firewall.
And finally, some miscellaneous comments about the ancillary services that I use:
- Elastic IP Addresses: I only use these for the bastion and the app server in cases where I’m not using a load balancer.
- Route53: There are huge convenience advantages to using AWS Route53 DNS server as an alternative to whatever your domain registrar offers.
- Certificate Manager: I use this for TLS/SSL certs for the Load Balancer and for the Cloudfront (CDN) end point cert. The certificates are free, plus AWS automatically renews them for you, so these are preferable to other alternatives like LetsEncrypt.
- Simple Email Service: This is a fantastic and cheap alternative to setting up or paying for a full-featured SMTP email server.
- Identity Access Management: I use this to create the key/secret pairs to provide to Open edX lms.yml and studio.yml for enabling AWS S3, and also for the backup scripts from my article, “Open edX Complete Backup Solution“.
MySQL is a logical starting point for scaling because it’s notorious for consuming Ubuntu’s resources, plus it’s behavior is volatile which complicates resource planning for everything else inside your Open edX server. You’ll reap immediate benefits by migrating MySQL to a remote server, even if you don’t expect to ever need to horizontally scale your app servers. MySQL runs more efficiently when installed to a standalone server, mainly because you’ll encounter less cache thrashing simply by virtue of having dedicated caches for this service.
Like MySQL, MongoDB’s resource requirements are also bursty and hard to predict. Your app environment will become noticeably more stable once you’ve migrated both MySQL and MongoDB. Also, on most Open edX platforms that I manage, MongoDB consumes noticeably more file system space than MySQL, and so migrating this service to a dedicated server will give you more administrative tools for managing this.
Simply migrating the MongoDB service to its own dedicated Ubuntu service is probably as far as you’ll ever need to go in terms of scaling. Mind you, MongoDB is massively scalable and so when researching this topic you’ll find a lot of information about sharding, clustering and other horizontally-focused scaling topics. You probably don’t need to worry about any of this.
Scaling Memcached and Redis
Memcached is a lightweight and resource friendly service as compared to MySQL and Memcached. There’s no tangible benefit to migrating this service unless you intend to horizontally scale your app servers. Open edX’s user session tokens are stored in Memcached, thus, if you’re running multiple app servers in a cluster then this service must run from a dedicated server. Beyond that, the only other tangible benefit that I’ve noticed when running a standalone memcached server is that there’s effectively no “warm up” period when rebooting servers because objects remain in the cache even during app server reboots.
Beginning with the Lilac release in May 2021 it is also necessary to migrate Redis along with Memcached. Redis is an in-memory cache solution that is super lightweight. Open edX now uses Redis as the message/task broker for Celery, which handles background asynchronous task management.
Scaling The Ubuntu File System
Instructors’ additional course content such as pdf documents and custom images is stored on the file system of your Open edX Ubuntu instance. There are significant system management benefits to using AWS S3 as an alternative to the Ubuntu file system. First, you delegate drive space planning responsibilities to AWS, which is great! Additionally, your users will experience marginally improved file access download times because files will be served via AWS Cloudfront instead of via https from your Ubuntu instance. And finally, this is one more I/O intensive service that you can eliminate from your app server environment, making everything else on the app server that much more well behaved.
With regard to platform scaling, you are only required to migrate instructor content to AWS S3 if you plan to horizontally scale your app servers. And at the risk of sounding repetitive, there’s a reasonable chance that you do not need to do this.
Adding a Load Balancer
Load Balancing is intentionally placed at the bottom of the list because it’s unlikely that you need to do this. Even though AWS as done a brilliant job of simplifying the setup of an AWS ELB in every way possible, there are still a lot of moving parts and if you add one then you’ll forever after have to deal with the added complexity. Thus, you’ve been warned 😉.
How to Setup a Load Balancer for Open edX
Following are things that I’ve researched but generally do not use when scaling platforms for clients:
- Docker or Kubernetes. To date, I’ve never worked on a project that had the combination of enough load and enough complexity to merit the added complexity that either of these technologies would bring to an Open edX installation. But if you feel compelled to research this topic more then I would recommend this excellent article from the guys at appsembler, “Scaling Open edX with Kubernetes at KubeCon EU 2016“
- Application clusters. Even though I’ve included this in the diagram for this article, it bears mentioning that most of my clients do not presently use a load balancer nor do they expect to at any point in the future. I’ve only had to create an application cluster on one project, for a major university. The good news is that its not difficult to implement an application cluster, assuming that everything else about your Open edX installation is stable.
- Scaling anything beyond what’s described in this article. I think about Open edX projects in terms of executable code and persisted data. When thinking about scaling a project, I’ve never needed to think beyond these two layers. Thus, in my experience so far at least, I’ve not needed to give any amount of serious thought to the wildly complicated Open edX architecture diagrams that circulate on the Internet.
Here are some infrastructure planning guidelines that I generally follow for generic projects:
- For up to 1,500 active users you can probably run on a single EC2 instance. If you are creating an Open edX instance for a small educational institution like a high school or community college, or a single department within a university then you can probably run in production on the equivalent of an AWS t2.xlarge EC2 instance, which is a 4-processor server with 16gb of RAM. During peak hours I would hope to observe average CPU utilization of say, 25% or below. Otherwise, I’d consider migrating the EC2 instance to a larger size.
- For more than 1,500 active users I would consider separating the principal data layers (MySQL, MongoDB and Memcached) into small individual EC2 instances as I have described in this article.
- I try to use the same EC2 instance size for everything because this improves the ease and usefulness of EC2 Reserved Instance contracts, which reduce server costs by as much as 40%. I mostly use t2.xlarge for app servers and t2.large for everything else except for the bastion which I generally use a t2.micro.