The official Native Open edX Ubuntu 16.04 64 bit Installation guide frankly states that “Setting up production configurations is beyond the scope of this wiki page.” So what exactly is the difference between the sandbox installation described in the official documentation, versus a true production installation? Let’s find out.
Summary
I’ll never forget that empty feeling in my stomach. After much Googling I’d finally found coherent instructions and code samples for installing Open edX on an Ubuntu server; only to read an ominous disclaimer at the top of the page basically stating that the instructions that followed were incomplete. Why, I wondered? What’s missing?
I’ve since learned that the simple 4-step instructions provided on the Open edX Alassian page, “Native Open edX Ubuntu 16.04 64 bit Installation” are not only the correct procedure for a native installation, but furthermore, that there is no difference in the executable program files between the “Sandbox” installation that is regularly mentioned in the Open edX ecosystem versus that of a genuine production installation. The difference between the two therefore lies solely in the additional management considerations that a production environment requires. Let’s take a closer look at the ten fundamental differences between a “sandbox” and a production site.
1. Well-Planned IT Infrastructure
A sandbox is generally used by not more than a handful of project team members for evaluation, planning, development and training purposes. It’s safe to make broad generalizations about the underlying infrastructure requirements simply because the load on the system is small and predictable. I usually recommend a t3.large EC2 on-demand instance for sandboxes. When you go to production your needs will immediately change.
A good production information system begins with well-planned infrastructure. Your Open edX project’s infrastructure requirements are unique, and thus merit an independent analysis that considers normal versus peak usage of your platform, its growth, your up-time requirements, your locale, and so on. Hopefully your project will not only succeed but will grow in popularity and usage over time, in which case you’ll require additional IT infrastructure. Your growth plan also merits special consideration.
Open edX software is highly scalable and infinitely configurable. You can install the entire Open edX platform on a single instance of any of the nearly 100 different virtual server configurations that AWS offers. But it’s also possible to install parts of the system on individual, specially-configured servers, or, spread across clusters of servers working together.
Infrastructure tends to involve capital expenditures as well as recurring monthly costs, which are topics that tend to be of interest to many if not all of the stakeholders in your project. Additionally,
2. Repeatable Software Installation
A production installation of Open edX runs the same executable system files as those of a Sandbox. The difference therefore lies in how these files are installed, with the key factor being repeatability. It’s important that you are able to run the installation process for your production environment in a predictable, repeatable way. This is why I always execute the installation with a Bash script like this one.
One of the most effective means of trouble shooting an Open edX installation is to re-install the software on a fresh, new Ubuntu server, and then migrate your data to this new instance. In order to employ these types of IT management strategies however, it is necessary that your installation procedure produces consistently identical servers.
You can learn more about how to design an effective production installation procedure by reading my blog posts, “Open edX Step-By-Step Production Installation Guide” and “Open edX Configuration Tutorial“.
3. Source Code Control and Change Management Policy
How you manage customizations in your Sandbox may not be a matter of importance, but it definitely matters in your production environment. You should create a Git repository to maintain all of the configuration and customization data for your production Open edX instance. I understand that you might be saying, “But I’m not planning to modify the Open edX software, so I don’t need to worry about source control”, but that is not true. Adding version control and a remote source code repository like GitHub will help you maintain a disciplined approach to change management that importantly, will give you a way to roll-back any changes that create unintended side effects. Even if you have no intention of modifying the core code base, you’ll most likely need to modify the following sets of files:
- Django app configuration and authorization JSON files in /edx/app/edxapp
- the LMS’ header and footer mako templates
- the Nginx configuration files for the LMS and CMS
- the email templates
Keeping your configuration data in a Git repository will also make it possible for you to automate your production “build” procedure. For lots of reasons, you will need to occasionally rebuild your production Open edX environment; or at least, parts of it. When mapping out your production build process you want to try to think of “configuration” as a single, indivisible entity that you simply add to an automation scripts in order generate a new production server.
Lastly, if more than one person will have the ability to connect to your Open edX instance via SSH in order to modify system files then it behooves you to follow a team-based work flow for source code management and deployment.
4. Comprehensive Theming
Open edX is a themeable platform, but this is not enabled by default. Importantly, theming is an elegant way for you to modify the appearance of your platform without actually modifying the code case. Simply described, a theming system relies on a base set of UI files (templates, CSS, Javascript, images, fonts, etc.) that are used in cases where you have not provided a custom alternative which is stored in a specially designated location on your server. Avoiding modifications to the code base is good practice on a production machine because, any time you upgrade your Open edX instance, all of the existing code base gets completely deleted. By contrast, maintaining your UI customizations in a custom theme is a good practice that also conforms to the Git repository discussion in the preceding section.
You can usually modify UI files in a sandbox without worrying about the long-term affects. But there are important considerations in a production environment that make this an undesirable practice.
You can learn more about how to setup Comprehensive Theming from my blog post, “Open edX Custom Theming Tutorial”
5. SSL/TLS Encryption for HTTPS
Today’s Internet users are accustomed to being provided with a secure browsing experience in any case in which their personal information is being transmitted across the Internet. Meanwhile, it’s become cheaper and easier to install and maintain your own SSL certificates, the encrypted files that make secure browsing possible on your Open edX server. Nowadays you can install and configure a completely free SSL certificate in only a few minutes, and thus, this is definitely a feature that should be included in your production build.
Adding an SSL certificate to your Open edX production environment does require some minor configuration changes to your server which, again, should ideally be maintained in a Git repository so as to facilitate as much as possible, an automated production build procedure.
You can learn more about how to setup SSL/TLS for https on Open edX by reading my blog post, “Add SSL Encryption to Open edX“.
Not sure what SSL encryption is or how it works? Then watch this video.
6. SMTP Email
The Open edX platform relies on an email server in order to generate various kinds of messages to your learners. For example, new learners receive an Activation email verifying their email address when they sign up on your platform. Open edX also generates emails for lost passwords, password change confirmations, course completion certification confirmations and various other administrative procedures. The problem is that Open edX itself does not include a full-featured email service, and you’ll therefore need to integrate your production Open edX platform with a suitable SMTP-compliant email service like Office 360 or Gmail.
A sandbox by contrast has little or no need to generate emails since the platform, by definition, is not being used in a production setting with real-life work flows that necessitate official electronic messages from the platform to the learner.
You can learn more about how to integrate an SMTP email service to Open edX by reading my blog post, “Setup SMTP Email on Open edX“.
7. Fault Tolerance & Recoverability
Though they’ll probably never ask you, your learners assume that their data is safe and that it is protected. Open edX persists data in three ways:
- Learners’ course data is stored in MySQL
- Course content is stored in MongoDB
- Session data and certain ancillary artifacts are stored on the Ubuntu file system
Maintaining the right level of fault tolerance and recoverability for each of these three classes of data is a topic that is both broad and deep, and it is the principal reason that architects of Open edX publish a very frank disclaimer on their own official native installation instructions that, “Setting up production configurations is beyond the scope of this wiki page“. Your backup and recovery requirements are unique to your project. But suffice it to say that these are a lot more important in your production environment than they are for your Sandbox. At a minimum, your production data should be backed up once per day. And for the backup to qualify as being fully recoverable, it necessarily needs to be permanently archived to a remote storage location like say, AWS S3.
You can learn how to implement a simple, effective remote backup system by reading my blog post, “Open edX Complete Backup Solution”
8. Upgradability
Upgrading the Open edX platform on a sandbox is easy. On the other hand, on a production platform your ability to upgrade the software is governed by how you’ve chosen to manage your configuration data and platform customization. I’m aware of multiple Open edX installations that are effectively permanently stuck on older versions of the software due to the extent of modifications that they’ve made to the code base.
To keep your production environment upgradable, two things are necessary:
- All configuration and customizations must be automatically installable
- Functional modifications (if any) can be ported to the next version
The first of these requirements is satisfied entirely by making disciplined use of a Git repository, whereas the second is a far more nuanced set of circumstances that must be individually evaluated. For that reason, it is in your interest to not make functional modifications to the Open edX code base if at all possible.
You can learn more about upgrading an Open edX platform by reading my blog post, “Upgrading Open edX”
9. Service Uptime & Reliability
Service up-time (not to be confused with recoverability from section 7) depends entirely on your infrastructure. For example, if you are running your production Open edX environment on a single, standalone server then you must occasionally take the service offline for maintenance.
10. Continuous Training
A production Open edX platform is like a living, breathing organism. Teaching faculty come and go, and the new professors will need training on how to manage the existing course content on Open edX. You’ll occasionally hire new community managers and web designers who most likely will be unfamiliar with at least some of the web technologies that Open edX leverages. The point is, your training program is not a one-off event.
Your training program (hopefully) focuses on the subset of core functionality that your faculty uses in its courseware. Of course, you’ll add to this as exciting new features become available in subsequent releases of the software, and you’ll need to revisit your training program in order to keep your materials fresh. Additionally, your existing faculty will probably benefit from an occasional refresher course on the fundamentals of the Course Management Studio, or at least, on the new platform features as these become available.
I hope you found this helpful. Please help me improve this article by leaving a comment below. Thank you!
Hi Lawrence:
We have deployed two instances (a) BITNAMI and (b) Juniper latest release in two Virtual Machines.
(a) Using Bitnami OVA file we could deploy the VM and started working on the OATH2 3rd party Google Authentication (similar to what you have here in this site) but it did not work and gave Error 500 . Query: While doing Add URI in the Google Developer Console, does the SERVER-IP has to be Public IP? Here the LMS site opened. Hardware was 4-core/4GB RAM/100GB HDD with Ubuntu16.04
(b) Using the Juniper release I could deploy the native.sh file after 1 hour but there are too much processes are there in the top command processes whereas I see most of them are Dead/Dormant and Sleeping. All the processes are python which are running gunicorn processes. Moreover the LMS site is not opening. Hardware was 16-core/16GB RAM/100GB HDD with Ubuntu16.04
Hi Lawrence, pretty much long and insightful production checklist. Thank you for that. How can I do capacity planning for OpenEDX, especially concurrent users? Probably another separate topic from this one I guess 🙂
you can vertically scale the platform indefinitely in all but the most extreme cases (think University of Texas, Stanford, MIT et al). the R4.large EC2 instance that i currently recommend generally works fine for installations with up to say, a hundred or so concurrent users. you realize a lot of economy of scale, even with vertical scaling, and so the concurrent user trajectory in non-linear. separate but related, migrating MySQL and/or RabbitMQ to their own server/cluster significantly increases the capacity of your app server, and this is more a prudent growth plan at any rate.
* you’ll find some good guidelines and metrics in this presentation: https://docs.google.com/presentation/d/1bVAWunCX7wpe2-CUTExcBM_BlgwRJlV9MiGN8keUfU0/edit#slide=id.g4212b1096_10