There are many benefits to moving your user data from the Ubuntu file system to AWS S3 cloud storage. It’s also pretty easy to do since Django handles the technical details, and the guys at edX have parameterized everything so that you only need to make some adjustments to the files in /edx/etc/.

Summary

If you’re looking for a general overview of how to scale the Open edX platform then you should read this first, “Scaling Open edX“. If you’re planning a completely new Open edX installation in which you are not migrating data and that needs to run at scale, then you might want to take a look at the automated tools included in Cookiecutter Open edX Devops as well as this article on Managing Your Open edX Backend With Terraform. Also, you should note that if you use the Github Actions CI deployment workflow included in the Cookiecutter then it will automatically add this feature to your Open edX installation.

Getting your Open edX platform to save user data to AWS S3 is easier than it might seem. The Open edX functionality for managing AWS S3 storage comes directly from Django and it’s actually quite simple to read and understand: “Amazon S3 — django storages“. Many different kinds of user data get stored, including user profile pics, e-commerce transaction receipts, instructor grade book downloads, and ancillary course  content like pdf documents and video files. Additionally, each of these different kinds of data can be individually configured, if you want. And so using AWS S3 is by no means an all-or-none decision. Furthermore, if you want, you can store different kinds of user data in different S3 buckets.

In the tutorial that follows we’ll setup AWS S3 storage for a hypothetical Open edX installation named www.surfschool.edu. We will store all user data in a single AWS S3 bucket of the same name, www.surfschool.edu.

All configuration is done to yml files in /edx/etc. The files are read once during platform startup.

I. /edx/etc/lms.yml

1. Basic AWS S3 configuration settings

Create your IAM key/secret here: https://console.aws.amazon.com/iam/home?region=us-east-1#/users. Grant “S3 Full Permissions” to your IAM user. You’ll find this in the “Attach existing policies directly” window tab. Also note that the label for this permissions seems to vary depending on your location, so it could be either “AmazonS3FullAccess” or “AmazonS3OutpostsFullAccess” which, to the best of my knowledge are identical.

#------------------------------------------------------------------------------
# Credentials for platform-wide Open edX access to your AWS S3 bucket.
#------------------------------------------------------------------------------        
AWS_ACCESS_KEY_ID: AKIA123456789AbCdVWU
AWS_SECRET_ACCESS_KEY: OXFH123456789ABCDefSmWcxsvZXb/stuvwImB

Create your AWS S3 bucket here: s3.console.aws.amazon.com/s3/. Accept all of the default values, which will create a bucket with no public access, no versioning and no encryption. This is probably what you want.

#------------------------------------------------------------------------------
# The name of your AWS S3 bucket.
#------------------------------------------------------------------------------
AWS_STORAGE_BUCKET_NAME: www.surfschool.edu
# mcdaniel: AWS_S3_CUSTOM_DOMAIN, as described in the documentation throws an error, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'mooc.academiacentral.org.s3.amazonaws.com'. (_ssl.c:1123)"
#           django-storages docs say, "If you’re using S3 as a CDN (via CloudFront), you’ll probably want this storage to serve those files using that: AWS_S3_CUSTOM_DOMAIN = 'cdn.mydomain.com'"
#           https://django-storages.readthedocs.io/en/1.7/backends/amazon-S3.html
#
#           NOTE: this will default to edxuploads.s3.amazonaws.com unless you set it!!!
#
AWS_S3_CUSTOM_DOMAIN: 's3.amazonaws.com/www.surfschool.edu'

# UserWarning: The default behavior of S3Boto3Storage is insecure and will change in django-storages 2.0. By default files and new buc