Note: get the source code for this article at https://github.com/lpm0073/cookiecutter-openedx-devops. Follow the instructions in the README.
Open edX is a beast! How do you tame it?
In all fairness, that question is prone to coming up for any successful, modern web platform that goes through a growth spurt. In this article we’ll explore how I manage not just one, but several very large Open edX installations. Here are what I consider to be the key success factors:
- Infrastructure as code. I use Terraform, but there are other good alternatives. Terraforms gives me the ability to version control my backend infrastructure service configurations so that I can safely fallback when I make a mistakes, and it gives me complete automation of the entire life cycle of each service which saves me lots of time.
- Dedicated VPC. I use a dedicated VPC for each Open edX installation, which helps to optimize the network for each installation as well as to keep systems from bleeding into each other, and, it also helps with tear-downs.
- Managed Services. All of my Open edX platforms run on AWS, and I’m biased towards using their managed services such as RDS for MySQL, DocumentDB for MongoDB, EKS for Kubernetes, and Elasticache for Redis. This dramatically reduces the number of failure points for which you are directly responsible.
- Kubernetes. Paradoxically, adding Kubernetes simplifies most aspects of system management
- Simple security policies. We’ll talk more below about firewall settings, user accounts, admin accounts, and exposing your backend services to the outside world.
Earlier this year I open-sourced my personal Terraform and Github Actions scripts in the form of a Cookiecutter template repository named Cookiecutter Openedx Devops. You can use this Cookiecutter to create your own Open edX devops repository, perfectly configured with your custom domain name and AWS account information. Cookiecutter Open edX Devops is a highly opinionated set of tools for creating and maintaining an AWS backend for Open edX that satisfies all five of these principals.
The Terraform modules of Cookiecutter Open edX Devops
Cookiecutter Open edX Devops leverages Terraform and Github Actions to provide 1-click backend solutions incorporating the current best practices for each service with regard to feature set, configuration, maintainability and security. This is mostly achieved by restricting the Terraform modules that it leverages to those supported by Hashicorp directly and of Terraform AWS modules, which is a community of AWS service users spanning dozens of large organizations and thousands of individual contributors. For each backend service it:
- creates and configures the service
- stores admin account credentials in Kubernetes Secrets
- creates security groups, IAM policies and anything else that necessary for the service to work correctly with the Open edX applications
- creates Route53 DNS subdomain records
- reconfigures the Open edX applications to use the new remote service
Fully integrated backend
- Kubernetes. Uses AWS Elastic Kubernetes Service to implement a Kubernetes cluster onto which all applications and scheduled jobs are deployed as pods. Tutor natively deploys Open edX applications as individual containers for LMS, CMS, Workers, Forum, etcetera. All backend service admin account credentials are automatically stored in Kubernetes Secrets. The Kubernetes configuration itself is intentionally as simple as possible. Simple is good.
- MySQL. uses AWS RDS for all MySQL data, accessible inside the VPC as mysql.yourdomain.edu:3306. Instance size settings are located in the environment configuration file, and other common configuration settings are located here. Passwords are stored in Kubernetes Secrets accessible from the EKS cluster.
- MongoDB. uses AWS DocumentDB for all MongoDB data, accessible inside the VPC as mongodb.master.yourdomain.edu:27017 and mongodb.reader.yourdomain.edu. Instance size settings are located in the environment configuration file, and other common configuration settings are located here. Passwords are stored in Kubernetes Secrets accessible from the EKS cluster.
- Redis. uses AWS ElastiCache for all Django application caches, accessible inside the VPC as cache.yourdomain.edu. Instance size settings are located in the environment configuration file. This is necessary in order to make the Open edX application layer completely ephemeral. Most importantly, user’s login session tokens are persisted in Redis and so these need to be accessible to all app containers from a single Redis cache. Common configuration settings are located here. Passwords are stored in Kubernetes Secrets accessible from the EKS cluster.
- Container Registry. uses this automated Github Actions workflow to build your tutor Open edX container and then register it in Amazon Elastic Container Registry (Amazon ECR). Uses this automated Github Actions workflow to deploy your container to AWS Amazon Elastic Kubernetes Service (EKS). EKS worker instance size settings are located in the environment configuration file. Note that tutor provides out-of-the-box support for Kubernetes. Terraform leverages Elastic Kubernetes Service to create a Kubernetes cluster onto which all services are deployed. Common configuration settings are located here.
- User Data. uses AWS S3 for storage of user data. This installation makes use of a Tutor plugin to offload object storage from the Ubuntu file system to AWS S3. It creates a public read-only bucket with write access provided to edxapp so that app-generated static content like user profile images, xblock-generated file content, application badges, e-commerce pdf receipts, instructor grades downloads and so on will be saved to this bucket. This is not only a necessary step for making your application layer ephemeral but it also facilitates the implementation of a CDN (which Terraform implements for you). Terraform additionally implements a completely separate, more secure S3 bucket for archiving your daily data backups of MySQL and MongoDB. Common configuration settings are located here.
- CDN. uses AWS Cloudfront as a CDN, publicly accessible as https://cdn.yourdomain.edu. Terraform creates Cloudfront distributions for each of your environments. These are linked to the respective public-facing S3 Bucket for each environment, and the requisite SSL/TLS ACM-issued certificate is linked. Terraform also automatically creates all Route53 DNS records of form cdn.yourdomain.edu. Common configuration settings are located here.
- Password & Secrets Management uses Kubernetes Secrets in the EKS cluster. Open edX software relies on many passwords and keys, collectively referred to in this documentation simply as, “secrets“. For all back services, including all Open edX applications, system account and root passwords are randomly and strongly generated during automated deployment and then archived in EKS’ secrets repository. This methodology facilitates routine updates to all of your passwords and other secrets, which is good practice these days. Common configuration settings are located here.
- SSL Certs. Uses AWS Certificate Manager and LetsEncrypt. A Kubernetes service manages all SSL/TLS certificates and renewal requests. It uses a combination of AWS Certificate Manager (ACM) as well as LetsEncrypt. Additionally, the ACM certificates are stored in two locations: your aws-region as well as in us-east-1 (as is required by AWS CloudFront). Common configuration settings are located here.
- DNS Management uses AWS Route53 hosted zones for DNS management. Terraform expects to find your root domain already present in Route53 as a hosted zone. It will automatically create additional hosted zones, one per environment for production, dev, test and so on. It automatically adds NS records to your root domain hosted zone as necessary to link the zones together. Configuration data exists within several modules but the highest-level settings are located here.
- System Access uses AWS Identity and Access Management (IAM) to manage all system users and roles. Terraform will create several user accounts with custom roles, one or more per service.
- Network Design. uses Amazon Virtual Private Cloud (Amazon VPC) based on the AWS account number provided in the global configuration file to take a top-down approach to compartmentalize all cloud resources and to customize the operating environment for your Open edX resources. Terraform will create a new virtual private cloud into which all resource will be provisioned. It creates a sensible arrangement of private and public subnets, network security settings and security groups. See additional VPC documentation here.
- Proxy Access to Backend Services. uses an Amazon EC2 t2.micro Ubuntu instance publicly accessible via ssh as bastion.yourdomain.edu:22 using the ssh key specified in the global configuration file. For security as well as performance reasons all backend services like MySQL, Mongo, Redis and the Kubernetes cluster are deployed into their own private subnets, meaning that none of these are publicly accessible. See additional Bastion documentation here. Terraform creates a t2.micro EC2 instance to which you can connect via ssh. In turn you can connect to services like MySQL via the bastion. Common configuration settings are located here. Note that if you are cost conscious then you could alternatively use AWS Cloud9 to gain access to all backend services.
Getting started with Terraform
What it is and how it works
Terraform is an open-source infrastructure as code software tool created by HashiCorp. Users define and provide data center infrastructure using a declarative configuration language known as HashiCorp Configuration Language (hcl). One of the important benefits of Terraform is the community-supported Terraform modules registered on the Terraform Registry. Cookiecutter Open edX Devops uses carefully vetted modules to implement all of the Open edX backend services like EKS, RDS, etcetera. This helps to ensure that each of these services conforms to current best practices with regard to feature set, maintainability and security considerations.
Terraform is a simple language that you can probably learn in a couple of hours or less. The challenges with Terraform are less about the language itself but rather, minutia regarding configuration of the backend services that Terraform is managing for you. For example, if you aspire to use Terraform to manage Open edX’s MySQL service remotely via RDS then in addition to the Terraform module that does this you also need a decent understanding of MySQL database administration, AWS itself, RDS, IAM, VPC, how policies work, at least the basics about TCP/IP networks, security groups, and so on. Ditto with regard to every other service that Terraform can manage for you.
The Terraform interpreter compares your code to your AWS account, calculates the difference between one and the other, and then formulates a plan to make your AWS account match your code. It generally works well, but it’s not perfect. Terraform can easily become confused for example, if you ever manually tinker with your AWS resources outside of Terraform. Simply put, there’s a learning curve. And the best way to learn is to use Terraform.
In this video I’ll navigate to the Terragrunt template for the VPC, initialize the environment, and create a plan. In this case, the VPC in my AWS account matches that of my source code, so the plan results in no actions.
The occasional Terraform hiccup aside, it’s a valuable tool. It certainly beats trying to build an entire AWS backend by hand.
Running Terraform modules
It’s important to remember that Terragrunt calls Terraform. You invoke modules from inside the Terragrunt modules folder which after having run the Cookiecutter is located in ./terraform/environments/prod/ folder. Each Terragrunt module invokes a corresponding Terraform module located in ./terraform/modules/. Command forms are pretty simple:
# ------------------------------------- # to manage an individual resource # ------------------------------------- cd ./terraform/environments/prod/vpc terragrunt init terragrunt plan terragrunt apply terragrunt destroy # ------------------------------------- # Or, to build the entire backend # ------------------------------------- cd ./terraform/environments/prod/ terragrunt run-all apply
Learn Terragrunt and Terraform in 10 minutes
Terragrunt is a templating language that enables you to parameterize Terraform modules in order to make them more re-usable. The most command use case is to manage environments for say, dev, test, and prod. The Cookiecutter creates one environment named “prod” which you can copy in order to create additional environments if you’d like.
Terragrunt looks really similar to Terraform. In fact, they’re almost identical. The recurring coding pattern that you’ll see in each Terragrunt module is
- locals: variable declarations that are local to the module. they are then addressable as, “local.variable_name”
- dependency: Terragrunt modules for which the module is dependent. This determines the execution order when using “run-all”
- terraform: a pointer to the Terraform module being templated
- inputs: variable declarations that are passed to the Terraform module. These are then addressable within the Terraform module as, “var.variable_name”
See these abbreviated snippets from the VPC module for example
Example Terraform module
In this Cookiecutter Terraform modules are called exclusively from Terragrunt. You never call a Terraform module directly. The following Terraform module is an abbreviated copy of the module called by the Terragrunt module above. I should point out the following:
- module: declares a Terraform module which can contain a collection of other Terraform directives to create and relate resources.
- source: indicates where the source code for this module is located. in this case we’re using shorthand that refers to a registered module in the Terraform registry located at https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest. Complete documentation for this module is located on this same web page.
- version: optional semantical version constraint notation for the module, assuming that it is located in a repository like Github that understands semantic versioning. Also you should note that the source code itself is actually located here: https://github.com/terraform-aws-modules/terraform-aws-eks, which I was able to determine by reading the Terraform Registry page for this module.
- cluster_name: an example of an input for this module. You’ll find a directory of all required and optional inputs on the Registry web page for the module. Also note that this input references a variable that was declared in the calling Terragrunt module. I know this because it is prefixed with “var.”
- node_security_group_additional_rules: an example of an input in the form of a Terraform dict, which basically follows JSON syntax.
Other Terraform language concepts to be aware of
- provider: AWS, Azure, Google, IBM, Alibaba, and others.
- locals: an area where you can declare variables that are local to the Terraform module.
- resource: a provider resource whose lifecycle will be managed by Terraform.
- data: references to resources that were created outside of the Terraform module.
- variable: for parameterizing Terraform modules. For every Terragrunt “input” we need to declare a corresponding Terraform variable.
- output: for exposing module output values. Cookiecutter Open edX Devops does not use outputs.
Following are examples of a data item and a resource declaration. A data reference addresses an existing AWS item whereas a resource declaration refers to an AWS item that will be created and whose lifecycle will be controlled by Terraform.
Of note regarding the data declaration, data “aws_s3_bucket” “environment_domain”
- aws_s3_bucket: refers to a type of resource that is understood by the AWS Provider which was declared elsewhere in the code. Documentation for this data declaration is located at: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/s3_bucket
- a Terraform data declaration gives you a way to retrieve attribute information about a resource, but it does not give you any ability to administer the resource itself. So in this case for example, we can use the data declaration to retrieve the ARN of the bucket, as data.aws_s3_bucket.environment_domain.id
- The AWS S3 bucket will be referred to in this source as, “data.aws_s3_bucket.environment_domain”, noting that the string value “environment_domain” is user defined.
- The data declaration is resolved as the AWS S3 bucket with the bucket name equating to the locally-declared Terraform variable, local.s3_bucket_name
Of note regarding the resource declaration, resource “aws_route53_record” “cdn_environment_domain”
- aws_route53_record: refers to a type of resource that is understood by the AWS Provider which was declare elsewhere. Documentation for the Terraform resource aws_route53_record is located at: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/route53_record
- note the example reference to aws_route53_record in the zone_id assignment
This next code snippet is a recurring pattern throughout the Terraform modules. These three declarations initialize a Terraform provider referencing the Kubernetes cluster. The certificate value is encrypted which is why you see a call to base64decode(). Also note the usage of dot notation. Complete documentation for each of these three declarations is available at:
- data “aws_eks_cluster”: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster
- data “aws_eks_cluster_auth”: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster_auth
- provider “kubernetes”: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs
Keeping your Terraform code updated
It’s important to keep in mind that Terraform code is modular and that the modules themselves have their own semantic versions, which results in a lot of version tracking on your part. This is a big part of what Cookiecutter Open edX Devops does for you by the way. All told there are upwards of a couple of dozen versions of Terraform modules in addition to Terraform itself, and the things with which Terraform interacts like the aws cli.
Bug fixes and security patches aside, Terraform is a rapidly evolving technology that incidentally sits atop lots of stacks of other technology that is sometimes also rapidly evolving. The Kubernetes project for example continues to add new features, many of which are adopted by AWS who in turn updates their CLI, resulting in the Terraform community-supported EKS module to update their module. So essentially, a change of any kind in your backend tends to result in updates to some Terraform module. To mitigate this you should occasionally re-run Cookiecutter Open edX against your repository.
Good luck on next steps with adding a Kubernetes cluster to your Open edX installation!! I hope you found this helpful. Contributors are welcome. My contact information is on my web site. Please help me improve this article by leaving a comment below. Thank you!