Problems with Celery? Learn how to diagnose and correct configuration errors with Celery and RabbitMQ in your Open edX instance. You’ll get your Open edX instance back online in around 30 minutes or less with this how-to guide.
A common pattern that you’ll see in Python Django projects like Open edX is Celery + RabbitMQ + Redis. This trio of open source technology provides a robust and scalable means for applications to communicate asynchronously with other back-end resources. The results are impressive: your application can interact with remote email systems, grader programs, MySQL, MongoDB and the file system on your Ubuntu server in a sophisticated way that not only prevents the front-end from freezing while waiting for responses, but also makes the platform completely resilient in the event of catastrophic system failures.
Celery is a distributed task queue that works exclusively with Python, and is a common complement to Django applications. The execution units, called tasks, are executed concurrently on one or more worker servers. Tasks can execute asynchronously or synchronously. RabbitMQ meanwhile is a popular open source message broker. RabbitMQ is lightweight and easy to deploy on premises and in the cloud. It supports multiple messaging protocols, including the Advanced Message Queueing Protocol (AMQP) used by Open edX. RabbitMQ can be deployed in distributed and federated configurations to meet high-scale, high-availability requirements. It runs on many operating systems and cloud environments, and provides a wide range of developer tools for most popular languages.
Mahdi Yusuf created this great screencast that demonstrates how Celery + RabbitMQ + Redis work together in a Django app to generate an email during a new user signup operation. This is especially relevant since Open edX performs these exact operations in its new user Registration screen.
Your Open edX instance relies extensively on Celery + RabbitMQ for a host of common application operations:
If you experience unusual behavior from any of these functions then often the culprit is probably a configuration problem with Celery.
1. Diagnosing Problems With Celery / RabbitMQ
Celery and RabbitMQ are both highly stable subsystems that generally work reliably without any administrative oversight whatsoever. If I encounter problems with either subsystem it is almost always following a software upgrade, a database restore, or a server migration. Furthermore, the culprit is almost always Celery. Following are some common symptoms of a configuration problem with Celery in an Open edX platform.
|New user registration||The new user registration screen appears to die, and becomes unresponsive after clicking the signup command button. The new user data is never saved into the system and the new user never receives an activation email.|
|Sending email||The screen appears to freeze or die after you click the password reset button.
New users do not receive their new user activation email.
|Drag & Drop||The drag & drop function appears to work, however the changed value is not recognized. Additionally you cannot save results.|
|Remote grading program||The screen appears to die after submitting a response to an exercise or quiz problem.|
|Uploading document||The screen appears to die, and the document is never uploaded. The system provide neither a success nor a failure message.|
If you’re experiencing any of these symptoms then you’ll next want to review the Open edX application logs for both the LMS and CMS to look for errors.
cat /edx/var/log/lms/edx.log -n 50 cat /edx/var/log/cms/edx.log -n 50
In particular, Celery often presents some challenges after migrations, upgrades and database restore operations. If Celery is not functioning correctly then you’ll lots of errors in the LMS log of the following form:
2. Trouble-Shooting Celery
RabbitMQ (and Celery) was installed by Ansible when you performed your native build. While there are many steps to installing RabbitMQ, it turns out that the configuration itself is relatively simple and thus, easy to trouble-shoot since there are a finite and limited set of configuration values to check. The configuration consists of the following
- Two Celery configuration values located in /etc/rabbitmq/rabbitmq-env.conf
- Three Celery usernames with passwords, and assigned permissions
- One virtual host
You can attempt any combination of the following trouble-shooting methods, testing your results after each adjustment by attempting any operation in your LMS such as providing a response to any problem, or by requesting a password reset email.
Celery Trouble-Shooting Tip I: Verify the IP address in /etc/rabbitmq/rabbitmq-env.conf
The correct inter