A common pattern that you’ll see in Python Django projects like Open edX is Celery + RabbitMQ + Redis. This trio of open source technology provides a robust and scalable means for applications to communicate asynchronously with other back-end resources. The results are impressive: your application can interact with remote email systems, grader programs, MySQL, MongoDB and the file system on your Ubuntu server in a sophisticated way that not only prevents the front-end from freezing while waiting for responses, but also makes the platform completely resilient in the event of catastrophic system failures.
Celery is a distributed task queue that works exclusively with Python, and is a common complement to Django applications. The execution units, called tasks, are executed concurrently on one or more worker servers. Tasks can execute asynchronously or synchronously. RabbitMQ meanwhile is a popular open source message broker. RabbitMQ is lightweight and easy to deploy on premises and in the cloud. It supports multiple messaging protocols, including the Advanced Message Queueing Protocol (AMQP) used by Open edX. RabbitMQ can be deployed in distributed and federated configurations to meet high-scale, high-availability requirements. It runs on many operating systems and cloud environments, and provides a wide range of developer tools for most popular languages.
Mahdi Yusuf created this great screencast that demonstrates how Celery + RabbitMQ + Redis work together in a Django app to generate an email during a new user signup operation. This is especially relevant since Open edX performs these exact operations in its new user Registration screen.
Your Open edX instance relies extensively on Celery + RabbitMQ for a host of common application operations:
- New user registration
- Drag & Drop UI functionality
- Uploading documents
Sending email to users
- Grading individual course exercise problems
If you experience unusual behavior from any of these functions then often the culprit is probably a configuration problem with Celery.
1. Diagnosing Problems With Celery / RabbitMQ
Celery and RabbitMQ are both highly stable subsystems that generally work reliably without any administrative oversight whatsoever. If I encounter problems with either subsystem it is almost always following a software upgrade, a database restore, or a server migration. Furthermore, the culprit is almost always Celery. Following are some common symptoms of a configuration problem with Celery in an Open edX platform.
|New user registration||The new user registration screen appears to die, and becomes unresponsive after clicking the signup command button. The new user data is never saved into the system and the new user never receives an activation email.|
|Sending email||The screen appears to freeze or die after you click the password reset button.|
New users do not receive their new user activation email.
|Drag & Drop||The drag & drop function appears to work, however the changed value is not recognized. Additionally you cannot save results.|
|Remote grading program||The screen appears to die after submitting a response to an exercise or quiz problem.|
|Uploading document||The screen appears to die, and the document is never uploaded. The system provide neither a success nor a failure message.|
If you’re experiencing any of these symptoms then you’ll next want to review the Open edX application logs for both the LMS and CMS to look for errors.
cat /edx/var/log/lms/edx.log -n 50 cat /edx/var/log/cms/edx.log -n 50
In particular, Celery often presents some challenges after migrations, upgrades and database restore operations. If Celery is not functioning correctly then you’ll lots of errors in the LMS log of the following form:
2. Trouble-Shooting Celery
RabbitMQ (and Celery) was installed by Ansible when you performed your native build. While there are many steps to installing RabbitMQ, it turns out that the configuration itself is relatively simple and thus, easy to trouble-shoot since there are a finite and limited set of configuration values to check. The configuration consists of the following
- Two Celery configuration values located in /etc/rabbitmq/rabbitmq-env.conf
- Three Celery usernames with passwords, and assigned permissions
- One virtual host
You can attempt any combination of the following trouble-shooting methods, testing your results after each adjustment by attempting any operation in your LMS such as providing a response to any problem, or by requesting a password reset email.
Celery Trouble-Shooting Tip I: Verify the IP address in /etc/rabbitmq/rabbitmq-env.conf
The correct internal IP address for address RabbitMQ is 127.0.0.1. However, sometimes Ansible will incorrectly populate this value with the actual value of the server’s internal IP address, such as for example, 172.16.102.101. I often encounter this problem whenever I reinstall RabbitMQ during platform upgrades.
sudo vim /etc/rabbitmq/rabbitmq-env.conf
Edit this file if necessary, and then restart the RabbitMQ service.
Celery Trouble-Shooting Tip II: Set permissions of all Celery users
The following code block relaxes permissions for the username “celery”. This is anecdotally the same as setting permissions of a Linux file to “777”. If the source of your Celery problem is permissions then this will eliminate the problem, noting however that afterwards you should seek more information on the ramifications of relaxing Celery permissions in Open edX (sorry, but I’m no expert).
sudo rabbitmqctl set_permissions -p / celery ".*" ".*" ".*" sudo rabbitmqctl set_permissions -p / admin ".*" ".*" ".*" sudo rabbitmqctl set_permissions -p / edx ".*" ".*" ".*" sudo service rabbitmq-server restart
Celery Trouble-Shooting Tip III: Reset Celery user passwords
If you followed my guidelines for a Native Build on Ubuntu 16.04 LTS then you (hopefully) have a file named my-passwords.yml located in /home/ubuntu. Per the illustration below, the passwords for the three Celery users is located at the bottom of this file, noting that in each case the value of the password is referenced from elsewhere in the same document. I’ve attempted to illustrate how this referencing scheme works by highlighting the appropriate row in the file for the “Admin” user’s password.
sudo rabbitmqctl change_password celery YourPasswordForTheCeleryUser sudo rabbitmqctl change_password edx YourPasswordForTheEdxUser sudo rabbitmqctl change_password admin YourPasswordForTheAdminUser sudo service rabbitmq-server restart
Celery Trouble-Shooting Tip IV: Re-install Celery
Some combination of the previous trouble-shooting methods very likely will solve your problem. But, if you’re still having problems then you can completely install RabbitMQ by calling the appropriate Ansible playbook, as follows:
sudo bash ./edx/app/edx_ansible/venvs/edx_ansible/bin/activate cd /edx/app/edx_ansible/edx_ansible/playbooks/ ansible-playbook -c local -i 'localhost,' ./run_role.yml -e "role=rabbitmq" #Use this command instead if you are using a server-vars.yml file #ansible-playbook -c local -i 'localhost,' ./run_role.yml -e "role=rabbitmq" -e@/edx/app/edx_ansible/server-vars.yml exit sudo service rabbitmq-server restart
Last thing, I found the following two threads from the Open edX Devops Google Group very helpful the first time I first encountered problems with Celery:
3. Restart Platform
If you performed any of the curative actions in the section above then you should restart your Open edX platform. For most administrative tasks you only need to restart the LMS and CMS but in this case its a good idea to restart everything.
sudo rm /edx/var/log/lms/edx.log #delete the current active log (to simplify diagnostics in the next step) sudo rm /edx/var/log/cms/edx.log #delete the current active log (to simplify diagnostics in the next step) # Option I: reboot the server sudo reboot #Option II: restart the Open edX services individually sudo /edx/bin/supervisorctl restart lms sudo /edx/bin/supervisorctl restart cms sudo /edx/bin/supervisorctl restart edxapp_worker: sudo /edx/bin/supervisorctl restart analytics_api sudo /edx/bin/supervisorctl restart certs sudo /edx/bin/supervisorctl restart discovery sudo /edx/bin/supervisorctl restart ecommerce sudo /edx/bin/supervisorctl restart ecomworker sudo /edx/bin/supervisorctl restart forum sudo /edx/bin/supervisorctl restart insights sudo /edx/bin/supervisorctl restart notifier-celery-workers sudo /edx/bin/supervisorctl restart notifier-scheduler sudo /edx/bin/supervisorctl restart xqueue sudo /edx/bin/supervisorctl restart xqueue_consumer
I hope you found this helpful. Please help me improve this article by leaving a comment below. Thank you!