Continuous Integration (CI) With Tutor Open edX - Part II

Tutor provides a powerful and easy to use set of tools for advanced configuration of your Open edX installation. Lets take a closer look at how you can automate your entire Open edX deployment process using Github Actions workflows.

This is part II of a two-part series on implementing CI/CD processes with Tutor Open edX. In part I of this series we learned how to automate the build process. In this article we’ll learn how to deploy the image that we built in part I. first we’ll look at a working, fully-automated Open edX deployment script using Github Actions, and then we’ll discuss how you can customize this work flow to suit your needs. we’ll be using this fully-functional Github Actions Deployment workflow that comes from the same repository as the workflow from part I.

Note

The code repository referenced in this article was generated with Cookiecutter OpenedX Devops, a completely free open source tool that helps you to create and maintain a robust, secure environment for your Open edX installation. The Github Actions workflow we review below is only one of the many fantastic devops tools that are provided completely free by Cookiecutter OpenedX Devops. If you want, you can follow the README instructions in the Cookiecutter to create your own repository, pre-configured with your own AWS account information, your Open edX platform domain name and so on. It’s pretty easy.

Some Background About Tutor and Github Actions

Tutor provides two distinct means of modifying the default configuration of your Open edX instance. First, it gives you a way to modify any of the hundreds of Open edX application parameters found in the edx-platform environment configuration files such as edx-platform/lms/envs/common.py and production.py. Just follow these well-written instructions on how to use the Tutor command line to configure your Open edX platform. Additionally, it gives you a way to create your own custom Docker image containing for example, additional Xblocks, a custom theme, an Open edX plugin, or you can even choose your own fork of the edx-platform source code repository. Creating a custom Docker image is easier than it might seem, and the procedure is well-documented here. In this article we’ll look at some common use cases of both of these for customizing your Open edX platform configuration, and, we’ll leverage Github Actions to fully automate and properly document our steps.

Importantly, you’ll also see in this example that Tutor is a sophisticated deployment tool that provides out-of-the-box support for Docker and for Kubernetes, which is an amazing and under-hyped capability. For context, Tutor does not simply deploy a single Open edX container. In fact, it splits the Open edX application suite into separate pods for lms, cms, one of more worker threads for each application, e-commerce, Discovery service, and so on. And, you can then individually administer and optimize the behavior and performance of these pods in your production cloud infrastructure environment. Furthermore, Tutor configures and deploys all of the back end services such as MySQL, MongoDB, Redis, and Nginx (or Caddy). All told it potentially deploys upwards of a dozen different kinds of containers, either into Docker installed on an Ubuntu instance or into a Kubernetes cluster.

Now then, before we dive into our deployment, I want to digress for a moment on why we’re using Github Actions for this exercise. GitHub Actions is a continuous integration and continuous delivery (CI/CD) platform that allows you to automate your build, test, and deployment pipeline. Github Actions can be triggered to run automatically upon, for example, every pull request to your repository. I became a fan of Github Actions about 18 months ago while working as part of a team on a large installation. It speeds up and simplifies the development pipeline for all of the team members by automating tasks such as kicking off unit tests each time code is pushed to a repository. It’s coded in yaml format and is very easy to learn and to read. It’s stored inside of your repository, right alongside your code and configuration data. It provides consistency in the build and deployment pipelines, especially when there are many steps to your build, like in the example we’re going to review below. It provides granular role-based permissions to your team and your systems user accounts allowing you to harden security around your deployment work flows. It provides a great set of tools for managing passwords and other sensitive data. and finally, it generates logs of each of your deployments which is enormously helpful when you need to trouble shoot something. So, in a few words, it’s valuable technology that you should consider adding to your repertoire.

Github Actions Workflow

The example Github Action workflow uses Tutor to deploy a custom Open edX Docker image that was previously uploaded to AWS Elastic Container Registry (ECR). Additionally, our backend has been horizontally scaled and leverages several AWS managed services like AWS Relational Database Service (RDS), AWS Document DB for MongoDB, and AWS Elasticache for Redis, and AWS Simple Email Service (SES) for SMTP, a Tutor plugin named hastexo/tutor-contrib-s3 that offloads file management from the Ubuntu file system to a secure AWS S3 bucket. Lastly, our example Github Action workflow deploys into AWS Elastic Kubernetes Service (EKS) which is also where all of our back end credentials are stored. Incidentally, for this example, all of these backend services were created using fully-automated Terraform modules that are included in Cookiecutter OpenedX Devops.

Keep in mind that all of these backend services are already up and running. During deployment we simply need to configure our Open edX applications to connect to these already-existing remote services rather than to the default “locally” hosted services. Also for the avoidance of any doubt, the practical theory surrounding how to scale Open edX’s backend services is substantially the same regardless of whether you’re running on Docker or using a native build.

We’re going to use this Github Actions workflow to automate the following operations

setup our workflow environment: create a virtual instance of Ubuntu and then install Tutor, aws-cli, and kubectl
authentication to the aws cli using a special AWS IAM user account named ci
authentication to kubectl using credentials that we’ll retrieve with the aws cli
retrieve connection parameters and account credentials from Kubernetes Secrets for all of the remote backend services, and then format these into valid Tutor Open edX parameters.
format and merge all of our custom lms.env.json and cms.env.json parameter values
configure hastexo/tutor-contrib-s3
set our Open edX custom theme
deploy our Open edX installation to a Kubernetes cluster

The workflow is pretty well documented, so I’m going to spend the remainder of this article explaining a few of the recurring patterns that you’ll encounter in this code.

Layout of a Github Actions workflow

The entire workflow is written in yaml using a limited set of commands that you can easily learn from this Getting Started guide. The workflow runs on a Github-hosted virtual server instance. Github gives you 2,000 minutes of server time for free each month which should be more than you need in most cases. The example workflow in this article consumer between 4 and 9 minutes each times it runs, and I usually run the work flow a few dozen times a month at most. The server instances are ephemeral and are destroy immediately upon completion of the workflow. You therefore have to build your entire deployment environment each time the workflow runs.

Per the screen shot below, this workflow runs on “workflow_dispatch” (row 18) which is a lofty way of saying that it runs when you click the “Run” button from the Github Actions console page of your repository on the github.com site. We define our workflow environment (on row 22) as an Ubuntu 20.04 server on which we’ll need to install Tutor, AWS CLI and kubectl at the beginning of the workflow. In this section we also define a few environment-wide variables that are referenced throughout the remaining code.

More about steps

I mostly learned about steps by looking at sample code. Fortunately, this example workflow contains a broad mixture of most of the kinds of things that you’ll want to do with your own workflow, and so it should be a pretty good starting point. The screen shot below demonstrates single line and multi line command formats. It also shows an example of how to incorporate community-supported code blocks. In this case, we’re installing kubectl, the Kubernetes command-line interface (cli) with a code block that is written and supported by Microsoft Corp’s Azure team.

tutor config from inside a Github Actions workflow

here’s an example of the syntax for setting multiple Open edX configuration parameters using tutor config. Also, you should note the syntax in this screen shot for referencing a Github Secret; in this case, an AWS IAM key and secret. And also note how the environment variable “TUTOR_RUN_SMTP” is declared and then redirected to $GITHUB_ENV, a variable that is declared by Github Actions itself and that in our case contains all of the Ubuntu environment variables that are declared over the life of this workflow.

Managing passwords and other sensitive data

Github Actions provides an excellent way for you to integrate passwords and other sensitive data into your workflow without risk of it leaking into the public domain. See “Settings” -> “Secrets” -> “Actions” in your repository for the console screen where you can define and store all sensitive data that you need to integrate into your workflow. In the example workflow we define two key pairs, one for the AWS CLI and another for the AWS SES SMTP email service. We additionally define a Github Personal Access Token (PAT) that determines the workflow’s permissions within Github Actions itself during execution. See the screen shot above for example syntax on how to reference this data from within the Github Action workflow: “${{ secrets.THE-NAME-OF-YOUR-SECRET }}”

For the example Github Actions workflow, most of the password data is stored in Kubernetes Secrets, thus the only secrets that we need to manage in Github Secrets are the AWS IAM key pairs for the aws cli and for connecting to the AWS SES SMTP email service.

Running the Github Actions workflow

Once you add a workflow to your repository its Github Actions console page will magically reformat itself into something similar to the screen shot below, noting of course that the “Run Workflow” button appears because we explicitly included the command, “on: [workflow_dispatch]” on row 18.

Each run of the workflow contains a detailed timestamped log of the console output from the Ubuntu instance. It is neatly organized by the text name of each step in the the workflow, on which you can click to drill-down to see the detailed output.

Here’s the console output for the “Deploy Tutor” command on row 281:

Tutor Open edX Configuration Data

The Open edX configuration data is stored in a few locations depending on its size, its sensitivity, and how Tutor consumes it. Password data for Kubernetes is explained in detail below, in the next section. You can also see from a cursory review of the deployment workflow itself that a considerable number of Tutor parameters are defined and set within this single piece of code. The remaining configuration data — the vast majority of the configuration data that is — is stored in the repository in ci/tutor-deploy/environments/prod/. In particular, the file settings_merge.json contains most of the Open edX application settings variables that you should at least be aware of. And remember, all of the files in this folder were created automatically by Cookiecutter OpenedX Devops.

Note that the entire contents of the two Tutor configuration files lms.env.json and cms.env.json are dumped to the console multiple times during the workflow using the built-in Linux command “cat”. See rows 163, 240, 254, 265 and 274.

Kubernetes

The example Github Actions workflow deploys into a Kubernetes cluster which while not required, certainly provides a lot of system management benefits. If you are considering deploying to a Kubernetes cluster then you might find the following additional explanations helpful.

kubectl

The Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can use kubectl to deploy applications, inspect and manage cluster resources, and view logs. The example Github Actions workflow uses this tool extensively, primarily for extracting password data from Kubernetes Secrets. For more information including a complete list of kubectl operations, see the kubectl reference documentation.

To connect to a Kubernetes cluster, kubectl depends on a configuration file named kubeconfig which in the example workflow is being retrieved from AWS EKS via the aws cli on line 57.

[code lang=”yml”] – name: Get Kube config
run: aws eks –region us-east-2 update-kubeconfig –name prod-stepwisemath-mexico –alias eks-prod[/code]

This very important 1-line command retrieves the connection data for your Kubernetes cluster and simultaneously formats it into a valid kubeconfig file and persists it on the Github ephemeral Ubuntu instance. Incidentally, the Kubernetes configuration data that is retrieved by this command is visible from the AWS EKS Console page for your cluster. Note the three data fields outlined in red at the bottom of the screen shot below.

Kubernetes Secrets

For the example Github Actions workflow we store configuration data for Open edX in multiple locations depending on a few factors. Open edX passwords for backend services like MySQL, MongoDB, and SMTP email are stored inside of Kubernetes Secrets and then retrieved using commands like the following

Keep in mind that you can run these commands from your own computer, assuming that you’ve installed and configured kubectl. Noting that many of the Kubernetes commands are in fact, multiple piped commands. One of the more common patterns that you’ll find retrieves, decrypts and reformats password data, as follows:

Adding the next pipe, this encrypted data transforms into the following:

And then adding all of the pipes together, the decrypted password data becomes a Tutor-compliant command line parameter:

Kubernetes Ingresses

Kubernetes ingresses for Open edX are handled at deployment because some of the stateful data, like Letsencrypt ssl certificates for example, can only be created at the moment of deployment. Row 181, “Create Kubernetes add-on resources” provides an entry point for kubectl to read a collection of Kubernetes manifests that we’ve stored in the repository in the folder ci/tutor-deploy/environments/prod/k8s/.

[code lang=”yml”]- name: Create Kubernetes add-on resources
run: |-
# Create kubernetes ingress and other environment resources
kubectl apply -f &amp;quot;ci/tutor-deploy/environments/$ENVIRONMENT_ID/k8s&amp;quot;[/code]

Good luck with automating your deployment!! I hope you found this helpful. Contributors are welcome. My contact information is on my web site. Please help me improve this article by leaving a comment below. Thank you!

Mobile:	+1 (617) 834-6172‬
Email:	lpm0073@gmail.com
Portfolio:	lawrencemcdaniel.com
Address:	Cambridge, MA

Blog

Continuous Integration (CI) With Tutor Open edX – Part II

Note

Some Background About Tutor and Github Actions