There are a few details to successfully restoring an Open edX instance that are definitely worth understanding well before the need arises. If you haven’t done so already, make sure to take a look at my blog post, “Complete Backup Solution for Open edX“. There are multiple kinds of persistent data in an Open edX platform, and these naturally rely on different kinds of technology and store. Whether you’re migrating to a new server or restoring an environment that suffered a catastrophic failure, you’ll need to consider all of the these:
|Asynchronous Task Data||RabbitMQ|
|Custom Theme||* wherever you’ve elected to host this|
|Passwords||* usually /home/ubuntu/my-passwords.yml|
|Course Videos||* hopefully youtube.com|
First, the Django framework relies on a process called “Database Migrations” to ensure that the physical database schema in MySQL is consistent with the Django objects described in the source code. That is, Django programmers do not directly modify the MySQL database schema, but instead rely on Database Migrations to handle this for them. When you restore an Open edX MySQL database, you have to consider the possibility that the physical schema of the backup differs from that of the Django codebase.
Second, Open edX relies extensively on a subsystem named RabbitMQ to asynchronously manage tasks. By “task” I’m referring to virtually every command button that a learner clicks while interacting with course data. RabbitMQ is invoked each time they provide a response to a problem, each time they interact with the discussion forum, provide a comment, record data in Notes/Annotations, request a password reset, and so on. These tasks are queued and then run in a first-in-first-out queue based on available server and network resources. Depending on the circumstances surrounding your need to restore or migrate your Open edX instance, there might be many hundreds or thousands (or millions) of pending RabbitMQ tasks in queue. If that’s the case then it would be prudent on your part to at least attempt to migrate these tasks as well. Furthermore, there are some common problems with migrating and/or restoring RabbitMQ configuration settings that we’ll look at in more detail below.
1. Restore MySQL Databases
Nearly all of your users’ data is stored in MySQL, including usernames and passwords, course content responses, notes & annotations data, their profile and so on. If you followed my guidelines on Creating a Complete Backup Solution for Open edX then your MySQL dump contains all of the Open edX databases and none of the MySQL system databases, which is exactly what you want. Restoring from your MySQL dump will therefore be as simple as the following:
mysql -u root -p db_backup.dump
That’s it. You do not need to restart MySQL, nor flush any caches or buffers, nor do any other administrative tasks. MySQL is remarkably resilient in this respect. However it is really important that your perform Database Migrations in the next section.
2. Run Database Migrations
This process is simple to run and usually only takes a minute or so to complete. Running this procedure more than once will not harm your database. Make Migrations scans the Django objects in your Open edX application codebases to ensure that the physical database tables, fields and relationships are consistent. It automatically adds anything that is missing, and it keeps track of what it’s done.
sudo -H -u edxapp -s bash cd ~ source /edx/app/edxapp/edxapp_env python /edx/app/edxapp/edx-platform/manage.py lms makemigrations --settings=aws python /edx/app/edxapp/edx-platform/manage.py lms migrate --settings=aws python /edx/app/edxapp/edx-platform/manage.py cms makemigrations --settings=aws python /edx/app/edxapp/edx-platform/manage.py cms migrate --settings=aws
3. Restore MongoDB
Mongo is strangely simple to restore. Here’s the basic structure of the command:
You’ll find additional information in the Official MongoDB Documentation.
4. Restart Platform
Given that you just restored all of your MongoDB course data, plus multiple MySQL databases and you potentially made schema modifications via Database Migrations, restarting the Open edX platform is a prudent idea. For most administrative tasks you only need to restart the LMS and CMS but in this case its a good idea to restart everything.
sudo rm /edx/var/log/lms/edx.log #delete the current active log (to simplify diagnostics in the next step) sudo rm /edx/var/log/cms/edx.log #delete the current active log (to simplify diagnostics in the next step) # Option I: reboot the server sudo reboot #Option II: restart the Open edX services individually sudo /edx/bin/supervisorctl restart lms sudo /edx/bin/supervisorctl restart cms sudo /edx/bin/supervisorctl restart edxapp_worker: sudo /edx/bin/supervisorctl restart analytics_api sudo /edx/bin/supervisorctl restart certs sudo /edx/bin/supervisorctl restart discovery sudo /edx/bin/supervisorctl restart ecommerce sudo /edx/bin/supervisorctl restart ecomworker sudo /edx/bin/supervisorctl restart forum sudo /edx/bin/supervisorctl restart insights sudo /edx/bin/supervisorctl restart notifier-celery-workers sudo /edx/bin/supervisorctl restart notifier-scheduler sudo /edx/bin/supervisorctl restart xqueue sudo /edx/bin/supervisorctl restart xqueue_consumer
5. Perform Diagnostics
Hopefully your Open edX instance is running now. If so, then you should next review the application logs for both the LMS and CMS to look for errors.
cat /edx/var/log/lms/edx.log -n 50 cat /edx/var/log/cms/edx.log -n 50
In particular, Celery, a component of RabbitMQ, often presents some challenges after migrations and database restore operations. If Celery is not functioning correctly then you’ll find a lot of errors in the LMS log with the general form of the following:
6. Trouble-Shooting Celery / RabbitMQ
RabbitMQ (and Celery) was installed by Ansible when you performed your native build. While there are many steps to installing RabbitMQ, it turns out that the configuration itself is relatively simple and thus, easy to trouble-shoot since there are a finite and limited set of configuration values to check. The configuration consists of the following
- Two Celery configuration values located in /etc/rabbitmq/rabbitmq-env.conf
- Three Celery usernames with passwords, and assigned permissions
- One virtual host
You can attempt any combination of the following trouble-shooting methods, testing your results after each adjustment by attempting any operation in your LMS such as providing a response to any problem, or by requesting a password reset email.
Celery Trouble-Shooting Tip I: Verify the IP address in /etc/rabbitmq/rabbitmq-env.conf
The correct internal IP address for address RabbitMQ is 127.0.0.1. However, sometimes Ansible will incorrectly populate this value with the actual value of the server’s internal IP address, such as for example, 172.16.102.101. I often encounter this problem whenever I reinstall RabbitMQ during platform upgrades.
sudo vim /etc/rabbitmq/rabbitmq-env.conf
Edit this file if necessary, and then restart the RabbitMQ service.
Celery Trouble-Shooting Tip II: Set permissions of all Celery users
The following code block relaxes permissions for the username “celery”. This is anecdotally the same as setting permissions of a Linux file to “777”. If the source of your Celery problem is permissions then this will eliminate the problem, noting however that afterwards you should seek more information on the ramifications of relaxing Celery permissions in Open edX (sorry, but I’m no expert).
sudo rabbitmqctl set_permissions -p / celery ".*" ".*" ".*" sudo service rabbitmq-server restart
Celery Trouble-Shooting Tip III: Reset Celery user passwords
If you followed my guidelines for a Native Build on Ubuntu 16.04 LTS then you (hopefully) have a file named my-passwords.yml located in /home/ubuntu. Per the illustration below, the passwords for the three Celery users is located at the bottom of this file, noting that in each case the value of the password is referenced from elsewhere in the same document. I’ve attempted to illustrate how this referencing scheme works by highlighting the appropriate row in the file for the “Admin” user’s password.
sudo rabbitmqctl change_password celery YourPasswordForTheCeleryUser sudo rabbitmqctl change_password edx YourPasswordForTheEdxUser sudo rabbitmqctl change_password admin YourPasswordForTheAdminUser sudo service rabbitmq-server restart
Celery Trouble-Shooting Tip IV: Re-install Celery
Some combination of the previous trouble-shooting methods very likely will solve your problem. But, if you’re still having problems then you can completely install RabbitMQ by calling the appropriate Ansible playbook, as follows:
sudo bash ./edx/app/edx_ansible/venvs/edx_ansible/bin/activate cd /edx/app/edx_ansible/edx_ansible/playbooks/ ansible-playbook -c local -i 'localhost,' ./run_role.yml -e "role=rabbitmq" #Use this command instead if you are using a server-vars.yml file #ansible-playbook -c local -i 'localhost,' ./run_role.yml -e "role=rabbitmq" -e@/edx/app/edx_ansible/server-vars.yml exit sudo service rabbitmq-server restart
Last thing, I found the following two threads from the Open edX Devops Google Group very helpful the first time I first encountered problems with Celery:
7. Re-Installing A Custom Theme
If your site uses comprehensive theming and you’ve restored your custom theme from a backup then it’s probable that you also need to recompile your static assets with Paver. Take note that this process runs for around 15 minutes, and your Open edX platform will not be available until the process completes. Also be aware that if your theme contains any compilation errors then your Open edX platform will almost certainly break.
# update assets as edxapp user sudo -H -u edxapp bash source /edx/app/edxapp/edxapp_env cd /edx/app/edxapp/edx-platform paver update_assets lms --settings=aws paver update_assets cms --settings=aws exit # restart edx instances /edx/bin/supervisorctl restart lms /edx/bin/supervisorctl restart cms /edx/bin/supervisorctl restart edxapp_worker:
I hope you found this helpful. Please help me improve this article by leaving a comment below. Thank you!