There are many benefits to moving your user data from the Ubuntu file system to AWS S3 cloud storage. It’s also pretty easy to do since Django handles the technical details, and the guys at edX have parameterized everything so that you only need to make some adjustments to the files in /edx/etc/.

Summary

If you’re looking for a general overview of how to scale the Open edX platform then you should read this first, “Scaling Open edX“. If you’re planning a completely new Open edX installation in which you are not migrating data and that needs to run at scale, then you might want to take a look at the automated tools included in Cookiecutter Open edX Devops as well as this article on Managing Your Open edX Backend With Terraform. Also, you should note that if you use the Github Actions CI deployment workflow included in the Cookiecutter then it will automatically add this feature to your Open edX installation.

Getting your Open edX platform to save user data to AWS S3 is easier than it might seem. The Open edX functionality for managing AWS S3 storage comes directly from Django and it’s actually quite simple to read and understand: “Amazon S3 — django storages“. Many different kinds of user data get stored, including user profile pics, e-commerce transaction receipts, instructor grade book downloads, and ancillary course  content like pdf documents and video files. Additionally, each of these different kinds of data can be individually configured, if you want. And so using AWS S3 is by no means an all-or-none decision. Furthermore, if you want, you can store different kinds of user data in different S3 buckets.

In the tutorial that follows we’ll setup AWS S3 storage for a hypothetical Open edX installation named www.surfschool.edu. We will store all user data in a single AWS S3 bucket of the same name, www.surfschool.edu.

All configuration is done to yml files in /edx/etc. The files are read once during platform startup.

I. /edx/etc/lms.yml

1. Basic AWS S3 configuration settings

Create your IAM key/secret here: https://console.aws.amazon.com/iam/home?region=us-east-1#/users. Grant “S3 Full Permissions” to your IAM user. You’ll find this in the “Attach existing policies directly” window tab. Also note that the label for this permissions seems to vary depending on your location, so it could be either “AmazonS3FullAccess” or “AmazonS3OutpostsFullAccess” which, to the best of my knowledge are identical.

#------------------------------------------------------------------------------
# Credentials for platform-wide Open edX access to your AWS S3 bucket.
#------------------------------------------------------------------------------        
AWS_ACCESS_KEY_ID: AKIA123456789AbCdVWU
AWS_SECRET_ACCESS_KEY: OXFH123456789ABCDefSmWcxsvZXb/stuvwImB

Create your AWS S3 bucket here: s3.console.aws.amazon.com/s3/. Accept all of the default values, which will create a bucket with no public access, no versioning and no encryption. This is probably what you want.

#------------------------------------------------------------------------------
# The name of your AWS S3 bucket.
#------------------------------------------------------------------------------
AWS_STORAGE_BUCKET_NAME: www.surfschool.edu
# mcdaniel: AWS_S3_CUSTOM_DOMAIN, as described in the documentation throws an error, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'mooc.academiacentral.org.s3.amazonaws.com'. (_ssl.c:1123)"
#           django-storages docs say, "If you’re using S3 as a CDN (via CloudFront), you’ll probably want this storage to serve those files using that: AWS_S3_CUSTOM_DOMAIN = 'cdn.mydomain.com'"
#           https://django-storages.readthedocs.io/en/1.7/backends/amazon-S3.html
#
#           NOTE: this will default to edxuploads.s3.amazonaws.com unless you set it!!!
#
AWS_S3_CUSTOM_DOMAIN: 's3.amazonaws.com/www.surfschool.edu'

# UserWarning: The default behavior of S3Boto3Storage is insecure and will change in django-storages 2.0. By default files and new buckets are saved with an ACL of 'public-read' (globally publicly readable). Version 2.0 will default to using the bucket's ACL. To opt into the new behavior set AWS_DEFAULT_ACL = None, otherwise to silence this warning explicitly set AWS_DEFAULT_ACL.
# canned ACL's: https://docs.aws.amazon.com/AmazonS3/latest/userguide/acl-overview.html#canned-acl
# -------------------------------
AWS_DEFAULT_ACL: null 
# -------------------------------
# mcdaniel: added these based on https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
# --------------------------------------
AWS_LOCATION: 'media'               # prefixes all paths with this value
#AWS_S3_REGION_NAME: us-east-1       # placeholder. this is us-east-1 by default
#AWS_S3_USE_SSL: false                # placeholder. this is true by default.
#AWS_QUERYSTRING_AUTH: true          # placeholder. default is true. Setting AWS_QUERYSTRING_AUTH to False to remove query parameter authentication from generated URLs. This can be useful if your S3 buckets are public.
# --------------------------------------
#------------------------------------------------------------------------------
# This is the main configuration setting that switches Django from using the
# Ubuntu file system to instead using AWS S3.
#------------------------------------------------------------------------------
DEFAULT_FILE_STORAGE: storages.backends.s3boto3.S3Boto3Storage
2. To redirect financial reporting data
# See: https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
FINANCIAL_REPORTS:
    BUCKET: www.surfschool.edu
    ROOT_PATH: 'financial_reports'
    STORAGE_TYPE: 's3'
3. To redirect user profile images
# See: https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
PROFILE_IMAGE_BACKEND:
    class: storages.backends.s3boto3.S3Boto3Storage
    options:
        headers:
            Cache-Control: max-age-{{ EDXAPP_PROFILE_IMAGE_MAX_AGE }}
        location: 'www.surfschool.edu/media/profile-images'
        base_url: /media/profile-images/
4. To redirect video data
See: https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
VIDEO_UPLOAD_PIPELINE:
    BUCKET: 'www.surfschool.edu'
    ROOT_PATH: 'video'
5. To redirect Xblock data
See: https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
XBLOCK_FS_STORAGE_BUCKET: www.surfschool.edu
XBLOCK_FS_STORAGE_PREFIX: xblock
XBLOCK_SETTINGS: {}

II. /edx/etc/studio.yml

1. Basic AWS S3 configuration settings
#------------------------------------------------------------------------------
# Credentials for platform-wide Open edX access to your AWS S3 bucket.
#------------------------------------------------------------------------------        
AWS_ACCESS_KEY_ID: AKIA123456789AbCdVWU
AWS_SECRET_ACCESS_KEY: OXFH123456789ABCDefSmWcxsvZXb/stuvwImB
#------------------------------------------------------------------------------
# The name of your AWS S3 bucket.
#------------------------------------------------------------------------------
AWS_STORAGE_BUCKET_NAME: www.surfschool.edu
# mcdaniel: AWS_S3_CUSTOM_DOMAIN, as described in the documentation throws an error, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'mooc.academiacentral.org.s3.amazonaws.com'. (_ssl.c:1123)"
#           django-storages docs say, "If you’re using S3 as a CDN (via CloudFront), you’ll probably want this storage to serve those files using that: AWS_S3_CUSTOM_DOMAIN = 'cdn.mydomain.com'"
#           https://django-storages.readthedocs.io/en/1.7/backends/amazon-S3.html
#
#           NOTE: this will default to edxuploads.s3.amazonaws.com unless you set it!!!
#
AWS_S3_CUSTOM_DOMAIN: 's3.amazonaws.com/www.surfschool.edu'

# UserWarning: The default behavior of S3Boto3Storage is insecure and will change in django-storages 2.0. By default files and new buckets are saved with an ACL of 'public-read' (globally publicly readable). Version 2.0 will default to using the bucket's ACL. To opt into the new behavior set AWS_DEFAULT_ACL = None, otherwise to silence this warning explicitly set AWS_DEFAULT_ACL.
# canned ACL's: https://docs.aws.amazon.com/AmazonS3/latest/userguide/acl-overview.html#canned-acl
# -------------------------------
AWS_DEFAULT_ACL: null 
# -------------------------------
# mcdaniel: added these based on https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
# --------------------------------------
AWS_LOCATION: 'media'               # prefixes all paths with this value
#AWS_S3_REGION_NAME: us-east-1       # placeholder. this is us-east-1 by default
#AWS_S3_USE_SSL: false                # placeholder. this is true by default.
#AWS_QUERYSTRING_AUTH: true          # placeholder. default is true. Setting AWS_QUERYSTRING_AUTH to False to remove query parameter authentication from generated URLs. This can be useful if your S3 buckets are public.
# --------------------------------------
#------------------------------------------------------------------------------
# This is the main configuration setting that switches Django from using the
# Ubuntu file system to instead using AWS S3.
#------------------------------------------------------------------------------
DEFAULT_FILE_STORAGE: storages.backends.s3boto3.S3Boto3Storage
2. To redirect financial reporting data
# See: https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
FINANCIAL_REPORTS:
    BUCKET: www.surfschool.edu
    ROOT_PATH: 'financial_reports'
    STORAGE_TYPE: 's3'
3. To redirect grades downloads
See: https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
GRADES_DOWNLOAD:
    BUCKET: 'grades.academiacentral.org'
    ROOT_PATH: 'www.surfschool.edu/grades'
    STORAGE_CLASS: storages.backends.s3boto3.S3Boto3Storage
    STORAGE_KWARGS:
        location: /tmp/edx-s3/grades
    STORAGE_TYPE: 's3'
4. To redirect video data
See: https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
VIDEO_UPLOAD_PIPELINE:
    BUCKET: 'www.surfschool.edu'
    ROOT_PATH: 'video'
5. To redirect Xblock data
See: https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html
XBLOCK_FS_STORAGE_BUCKET: www.surfschool.edu
XBLOCK_FS_STORAGE_PREFIX: xblock
XBLOCK_SETTINGS: {}

III. /edx/etc/analytics_api.yml

1. To redirect media storage
MEDIA_STORAGE_BACKEND:
    DEFAULT_FILE_STORAGE: storages.backends.s3boto3.S3Boto3Storage
    MEDIA_ROOT: /edx/var/analytics_api/media
    MEDIA_URL: /media/
2. To redirect report downloads
REPORT_DOWNLOAD_BACKEND:
    COURSE_REPORT_FILE_LOCATION_TEMPLATE: '{course_id}_{report_name}.csv'
    DEFAULT_FILE_STORAGE: storages.backends.s3boto3.S3Boto3Storage
    MEDIA_ROOT: /edx/var/analytics_api/static/reports
    MEDIA_URL: http://localhost:8100/static/reports/

IV. /edx/etc/discovery.yml

1. To redirect media storage
MEDIA_STORAGE_BACKEND:
    DEFAULT_FILE_STORAGE: storages.backends.s3boto3.S3Boto3Storage
    MEDIA_ROOT: /edx/var/discovery/media
    MEDIA_URL: /media/

V. /edx/etc/ecommerce.yml

1. To redirect media storage
MEDIA_STORAGE_BACKEND:
    DEFAULT_FILE_STORAGE: storages.backends.s3boto3.S3Boto3Storage
    MEDIA_ROOT: /edx/var/discovery/media
    MEDIA_URL: /media/

VI. /edx/etc/xqueue.yml

UPLOAD_BUCKET: www.surfschool.edu
UPLOAD_PATH_PREFIX: xqueue