forked from celery/celery
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathFAQ
690 lines (480 loc) · 22.2 KB
/
FAQ
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
============================
Frequently Asked Questions
============================
General
=======
What kinds of things should I use celery for?
---------------------------------------------
**Answer:** `Queue everything and delight everyone`_ is a good article
describing why you would use a queue in a web context.
.. _`Queue everything and delight everyone`:
http://decafbad.com/blog/2008/07/04/queue-everything-and-delight-everyone
These are some common use cases:
* Running something in the background. For example, to finish the web request
as soon as possible, then update the users page incrementally.
This gives the user the impression of good performane and "snappiness", even
though the real work might actually take some time.
* Running something after the web request has finished.
* Making sure something is done, by executing it asynchronously and using
retries.
* Scheduling periodic work.
And to some degree:
* Distributed computing.
* Parallel execution.
Misconceptions
==============
Is celery dependent on pickle?
------------------------------
**Answer:** No.
Celery can support any serialization scheme and has support for JSON/YAML and
Pickle by default. You can even send one task using pickle, and another one
with JSON seamlessly, this is because every task is associated with a
content-type. The default serialization scheme is pickle because it's the most
used, and it has support for sending complex objects as task arguments.
You can set a global default serializer, the default serializer for a
particular Task, or even what serializer to use when sending a single task
instance.
Is celery for Django only?
--------------------------
**Answer:** No.
Celery does not depend on Django anymore. To use Celery with Django you have
to use the `django-celery`_ package:
.. _`django-celery`: http://pypi.python.org/pypi/django-celery
Do I have to use AMQP/RabbitMQ?
-------------------------------
**Answer**: No.
You can also use Redis or an SQL database, see `Using other
queues`_.
.. _`Using other queues`:
http://ask.github.com/celery/tutorials/otherqueues.html
Redis or a database won't perform as well as
an AMQP broker. If you have strict reliability requirements you are
encouraged to use RabbitMQ or another AMQP broker. Redis/database also use
polling, so they are likely to consume more resources. However, if you for
some reason are not able to use AMQP, feel free to use these alternatives.
They will probably work fine for most use cases, and note that the above
points are not specific to celery; If using Redis/database as a queue worked
fine for you before, it probably will now. You can always upgrade later
if you need to.
Is celery multi-lingual?
------------------------
**Answer:** Yes.
celeryd is an implementation of celery in python. If the language has an AMQP
client, there shouldn't be much work to create a worker in your language.
A celery worker is just a program connecting to the broker to consume
messages. There's no other communication involved.
Also, there's another way to be language indepedent, and that is to use REST
tasks, instead of your tasks being functions, they're URLs. With this
information you can even create simple web servers that enable preloading of
code. See: `User Guide: Remote Tasks`_.
.. _`User Guide: Remote Tasks`:
http://ask.github.com/celery/userguide/remote-tasks.html
Troubleshooting
===============
MySQL is throwing deadlock errors, what can I do?
-------------------------------------------------
**Answer:** MySQL has default isolation level set to ``REPEATABLE-READ``,
if you don't really need that, set it to ``READ-COMMITTED``.
You can do that by adding the following to your ``my.cnf``::
[mysqld]
transaction-isolation = READ-COMMITTED
For more information about InnoDBs transaction model see `MySQL - The InnoDB
Transaction Model and Locking`_ in the MySQL user manual.
(Thanks to Honza Kral and Anton Tsigularov for this solution)
.. _`MySQL - The InnoDB Transaction Model and Locking`: http://dev.mysql.com/doc/refman/5.1/en/innodb-transaction-model.html
celeryd is not doing anything, just hanging
--------------------------------------------
**Answer:** See `MySQL is throwing deadlock errors, what can I do?`_.
or `Why is Task.delay/apply\* just hanging?`.
Why is Task.delay/apply\*/celeryd just hanging?
-----------------------------------------------
**Answer:** There is a bug in some AMQP clients that will make it hang if
it's not able to authenticate the current user, the password doesn't match or
the user does not have access to the virtual host specified. Be sure to check
your broker logs (for RabbitMQ that is ``/var/log/rabbitmq/rabbit.log`` on
most systems), it usually contains a message describing the reason.
Why won't celeryd run on FreeBSD?
---------------------------------
**Answer:** multiprocessing.Pool requires a working POSIX semaphore
implementation which isn't enabled in FreeBSD by default. You have to enable
POSIX semaphores in the kernel and manually recompile multiprocessing.
Luckily, Viktor Petersson has written a tutorial to get you started with
Celery on FreeBSD here:
http://www.playingwithwire.com/2009/10/how-to-get-celeryd-to-work-on-freebsd/
I'm having ``IntegrityError: Duplicate Key`` errors. Why?
---------------------------------------------------------
**Answer:** See `MySQL is throwing deadlock errors, what can I do?`_.
Thanks to howsthedotcom.
Why aren't my tasks processed?
------------------------------
**Answer:** With RabbitMQ you can see how many consumers are currently
receiving tasks by running the following command::
$ rabbitmqctl list_queues -p <myvhost> name messages consumers
Listing queues ...
celery 2891 2
This shows that there's 2891 messages waiting to be processed in the task
queue, and there are two consumers processing them.
One reason that the queue is never emptied could be that you have a stale
celery process taking the messages hostage. This could happen if celeryd
wasn't properly shut down.
When a message is recieved by a worker the broker waits for it to be
acknowledged before marking the message as processed. The broker will not
re-send that message to another consumer until the consumer is shut down
properly.
If you hit this problem you have to kill all workers manually and restart
them::
ps auxww | grep celeryd | awk '{print $2}' | xargs kill
You might have to wait a while until all workers have finished the work they're
doing. If it's still hanging after a long time you can kill them by force
with::
ps auxww | grep celeryd | awk '{print $2}' | xargs kill -9
Why won't my Task run?
----------------------
**Answer:** There might be syntax errors preventing the tasks module being imported.
You can find out if celery is able to run the task by executing the
task manually:
>>> from myapp.tasks import MyPeriodicTask
>>> MyPeriodicTask.delay()
Watch celeryds logfile to see if it's able to find the task, or if some
other error is happening.
Why won't my Periodic Task run?
-------------------------------
**Answer:** See `Why won't my Task run?`_.
How do I discard all waiting tasks?
------------------------------------
**Answer:** Use ``celery.task.discard_all()``, like this:
>>> from celery.task import discard_all
>>> discard_all()
1753
The number ``1753`` is the number of messages deleted.
You can also start celeryd with the ``--discard`` argument which will
accomplish the same thing.
I've discarded messages, but there are still messages left in the queue?
------------------------------------------------------------------------
**Answer:** Tasks are acknowledged (removed from the queue) as soon
as they are actually executed. After the worker has received a task, it will
take some time until it is actually executed, especially if there are a lot
of tasks already waiting for execution. Messages that are not acknowledged are
hold on to by the worker until it closes the connection to the broker (AMQP
server). When that connection is closed (e.g because the worker was stopped)
the tasks will be re-sent by the broker to the next available worker (or the
same worker when it has been restarted), so to properly purge the queue of
waiting tasks you have to stop all the workers, and then discard the tasks
using ``discard_all``.
Windows: The ``-B`` / ``--beat`` option to celeryd doesn't work?
----------------------------------------------------------------
**Answer**: That's right. Run ``celerybeat`` and ``celeryd`` as separate
services instead.
Tasks
=====
How can I reuse the same connection when applying tasks?
--------------------------------------------------------
**Answer**: See :doc:`userguide/executing`.
Can I execute a task by name?
-----------------------------
**Answer**: Yes. Use :func:`celery.execute.send_task`.
You can also execute a task by name from any language
that has an AMQP client.
>>> from celery.execute import send_task
>>> send_task("tasks.add", args=[2, 2], kwargs={})
<AsyncResult: 373550e8-b9a0-4666-bc61-ace01fa4f91d>
Results
=======
How dow I get the result of a task if I have the ID that points there?
----------------------------------------------------------------------
**Answer**: Use ``Task.AsyncResult``::
>>> result = MyTask.AsyncResult(task_id)
>>> result.get()
This will give you a :class:`celery.result.BaseAsyncResult` instance
using the tasks current result backend.
If you need to specify a custom result backend you should use
:class:`celery.result.BaseAsyncResult` directly::
>>> from celery.result import BaseAsyncResult
>>> result = BaseAsyncResult(task_id, backend=...)
>>> result.get()
Brokers
=======
Why is RabbitMQ crashing?
-------------------------
RabbitMQ will crash if it runs out of memory. This will be fixed in a
future release of RabbitMQ. please refer to the RabbitMQ FAQ:
http://www.rabbitmq.com/faq.html#node-runs-out-of-memory
Some common Celery misconfigurations can crash RabbitMQ:
* Events.
Running ``celeryd`` with the ``-E``/``--events`` option will send messages
for events happening inside of the worker. If these event messages
are not consumed, you will eventually run out of memory.
Events should only be enabled if you have an active monitor consuming them.
* AMQP backend results.
When running with the AMQP result backend, every task result will be sent
as a message. If you don't collect these results, they will build up and
RabbitMQ will eventually run out of memory.
If you don't use the results for a task, make sure you set the
``ignore_result`` option:
.. code-block python
@task(ignore_result=True)
def mytask():
...
class MyTask(Task):
ignore_result = True
Results can also be disabled globally using the ``CELERY_IGNORE_RESULT``
setting.
Can I use celery with ActiveMQ/STOMP?
-------------------------------------
**Answer**: Yes, but this is somewhat experimental for now.
It is working ok in a test configuration, but it has not
been tested in production like RabbitMQ has. If you have any problems with
using STOMP and celery, please report the bugs to the issue tracker:
http://github.com/ask/celery/issues/
First you have to use the ``master`` branch of ``celery``::
$ git clone git://github.com/ask/celery.git
$ cd celery
$ sudo python setup.py install
$ cd ..
Then you need to install the ``stompbackend`` branch of ``carrot``::
$ git clone git://github.com/ask/carrot.git
$ cd carrot
$ git checkout stompbackend
$ sudo python setup.py install
$ cd ..
And my fork of ``python-stomp`` which adds non-blocking support::
$ hg clone http://bitbucket.org/asksol/python-stomp/
$ cd python-stomp
$ sudo python setup.py install
$ cd ..
In this example we will use a queue called ``celery`` which we created in
the ActiveMQ web admin interface.
**Note**: For ActiveMQ the queue name has to have ``"/queue/"`` prepended to
it. i.e. the queue ``celery`` becomes ``/queue/celery``.
Since a STOMP queue is a single named entity and it doesn't have the
routing capabilities of AMQP you need to set both the ``queue``, and
``exchange`` settings to your queue name. This is a minor inconvenience since
carrot needs to maintain the same interface for both AMQP and STOMP (obviously
the one with the most capabilities won).
Use the following specific settings in your ``settings.py``:
.. code-block:: python
# Makes python-stomp the default backend for carrot.
CARROT_BACKEND = "stomp"
# STOMP hostname and port settings.
BROKER_HOST = "localhost"
BROKER_PORT = 61613
# The queue name to use (both queue and exchange must be set to the
# same queue name when using STOMP)
CELERY_DEFAULT_QUEUE = "/queue/celery"
CELERY_DEFAULT_EXCHANGE = "/queue/celery"
CELERY_QUEUES = {
"/queue/celery": {"exchange": "/queue/celery"}
}
Now you can go on reading the tutorial in the README, ignoring any AMQP
specific options.
What features are not supported when using STOMP?
--------------------------------------------------
This is a (possible incomplete) list of features not available when
using the STOMP backend:
* routing keys
* exchange types (direct, topic, headers, etc)
* immediate
* mandatory
Features
========
How can I run a task once another task has finished?
----------------------------------------------------
**Answer**: You can safely launch a task inside a task.
Also, a common pattern is to use callback tasks:
.. code-block:: python
@task()
def add(x, y, callback=None):
result = x + y
if callback:
callback.delay(result)
return result
@task(ignore_result=True)
def log_result(result, **kwargs):
logger = log_result.get_logger(**kwargs)
logger.info("log_result got: %s" % (result, ))
>>> add.delay(2, 2, callback=log_result)
Can I cancel the execution of a task?
-------------------------------------
**Answer**: Yes. Use ``result.revoke``::
>>> result = add.apply_async(args=[2, 2], countdown=120)
>>> result.revoke()
or if you only have the task id::
>>> from celery.task.control import revoke
>>> revoke(task_id)
Why aren't my remote control commands received by all workers?
--------------------------------------------------------------
**Answer**: To receive broadcast remote control commands, every ``celeryd``
uses its hostname to create a unique queue name to listen to,
so if you have more than one worker with the same hostname, the
control commands will be recieved in round-robin between them.
To work around this you can explicitly set the hostname for every worker
using the ``--hostname`` argument to ``celeryd``::
$ celeryd --hostname=$(hostname).1
$ celeryd --hostname=$(hostname).2
etc, etc.
Can I send some tasks to only some servers?
--------------------------------------------
**Answer:** Yes. You can route tasks to an arbitrary server using AMQP,
and a worker can bind to as many queues as it wants.
Say you have two servers, ``x``, and ``y`` that handles regular tasks,
and one server ``z``, that only handles feed related tasks, you can use this
configuration:
* Servers ``x`` and ``y``: settings.py:
.. code-block:: python
CELERY_DEFAULT_QUEUE = "regular_tasks"
CELERY_QUEUES = {
"regular_tasks": {
"binding_key": "task.#",
},
}
CELERY_DEFAULT_EXCHANGE = "tasks"
CELERY_DEFAULT_EXCHANGE_TYPE = "topic"
CELERY_DEFAULT_ROUTING_KEY = "task.regular"
* Server ``z``: settings.py:
.. code-block:: python
CELERY_DEFAULT_QUEUE = "feed_tasks"
CELERY_QUEUES = {
"feed_tasks": {
"binding_key": "feed.#",
},
}
CELERY_DEFAULT_EXCHANGE = "tasks"
CELERY_DEFAULT_ROUTING_KEY = "task.regular"
CELERY_DEFAULT_EXCHANGE_TYPE = "topic"
``CELERY_QUEUES`` is a map of queue names and their exchange/type/binding_key,
if you don't set exchange or exchange type, they will be taken from the
``CELERY_DEFAULT_EXCHANGE``/``CELERY_DEFAULT_EXCHANGE_TYPE`` settings.
Now to make a Task run on the ``z`` server you need to set its
``routing_key`` attribute so it starts with the words ``"task.feed."``:
.. code-block:: python
from feedaggregator.models import Feed
from celery.decorators import task
@task(routing_key="feed.importer")
def import_feed(feed_url):
Feed.objects.import_feed(feed_url)
or if subclassing the ``Task`` class directly:
.. code-block:: python
class FeedImportTask(Task):
routing_key = "feed.importer"
def run(self, feed_url):
Feed.objects.import_feed(feed_url)
You can also override this using the ``routing_key`` argument to
:func:`celery.task.apply_async`:
>>> from myapp.tasks import RefreshFeedTask
>>> RefreshFeedTask.apply_async(args=["http://cnn.com/rss"],
... routing_key="feed.importer")
If you want, you can even have your feed processing worker handle regular
tasks as well, maybe in times when there's a lot of work to do.
Just add a new queue to server ``z``'s ``CELERY_QUEUES``:
.. code-block:: python
CELERY_QUEUES = {
"feed_tasks": {
"binding_key": "feed.#",
},
"regular_tasks": {
"binding_key": "task.#",
},
}
Since the default exchange is ``tasks``, they will both use the same
exchange.
If you have another queue but on another exchange you want to add,
just specify a custom exchange and exchange type:
.. code-block:: python
CELERY_QUEUES = {
"feed_tasks": {
"binding_key": "feed.#",
},
"regular_tasks": {
"binding_key": "task.#",
}
"image_tasks": {
"binding_key": "image.compress",
"exchange": "mediatasks",
"exchange_type": "direct",
},
}
If you're confused about these terms, you should read up on AMQP and RabbitMQ.
`Rabbits and Warrens`_ is an excellent blog post describing queues and
exchanges. There's also AMQP in 10 minutes*: `Flexible Routing Model`_,
and `Standard Exchange Types`_. For users of RabbitMQ the `RabbitMQ FAQ`_
could also be useful as a source of information.
.. _`Rabbits and Warrens`: http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/
.. _`Flexible Routing Model`: http://bit.ly/95XFO1
.. _`Standard Exchange Types`: http://bit.ly/EEWca
.. _`RabbitMQ FAQ`: http://www.rabbitmq.com/faq.html
Can I change the interval of a periodic task at runtime?
--------------------------------------------------------
**Answer**: Yes. You can override ``PeriodicTask.is_due`` or turn
``PeriodicTask.run_every`` into a property:
.. code-block:: python
class MyPeriodic(PeriodicTask):
def run(self):
# ...
@property
def run_every(self):
return get_interval_from_database(...)
Does celery support task priorities?
------------------------------------
**Answer**: No. In theory, yes, as AMQP supports priorities. However
RabbitMQ doesn't implement them yet.
The usual way to prioritize work in celery, is to route high priority tasks
to different servers. In the real world this may actually work better than per message
priorities. You can use this in combination with rate limiting to achieve a
highly performant system.
Should I use retry or acks_late?
--------------------------------
**Answer**: Depends. It's not necessarily one or the other, you may want
to use both.
``Task.retry`` is used to retry tasks, notably for expected errors that
is catchable with the ``try:`` block. The AMQP transaction is not used
for these errors: **if the task raises an exception it is still acked!**.
The ``acks_late`` setting would be used when you need the task to be
executed again if the worker (for some reason) crashes mid-execution.
It's important to note that the worker is not known to crash, and if
it does it is usually an unrecoverable error that requires human
intervention (bug in the worker, or task code).
In an ideal world you could safely retry any task that has failed, but
this is rarely the case. Imagine the following task:
.. code-block:: python
@task()
def process_upload(filename, tmpfile):
# Increment a file count stored in a database
increment_file_counter()
add_file_metadata_to_db(filename, tmpfile)
copy_file_to_destination(filename, tmpfile)
If this crashed in the middle of copying the file to its destination
the world would contain incomplete state. This is not a critical
scenario of course, but you can probably imagine something far more
sinister. So for ease of programming we have less reliability;
It's a good default, users who require it and know what they
are doing can still enable acks_late (and in the future hopefully
use manual acknowledgement)
In addition ``Task.retry`` has features not available in AMQP
transactions: delay between retries, max retries, etc.
So use retry for Python errors, and if your task is reentrant
combine that with ``acks_late`` if that level of reliability
is required.
Can I schedule tasks to execute at a specific time?
---------------------------------------------------
.. module:: celery.task.base
**Answer**: Yes. You can use the ``eta`` argument of :meth:`Task.apply_async`.
Or to schedule a periodic task at a specific time, use the
:class:`celery.task.schedules.crontab` schedule behavior:
.. code-block:: python
from celery.task.schedules import crontab
from celery.decorators import periodic_task
@periodic_task(run_every=crontab(hours=7, minute=30, day_of_week="mon"))
def every_monday_morning():
print("This is run every monday morning at 7:30")
How do I shut down ``celeryd`` safely?
--------------------------------------
**Answer**: Use the ``TERM`` signal, and celery will finish all currently
executing jobs and shut down as soon as possible. No tasks should be lost.
You should never stop ``celeryd`` with the ``KILL`` signal (``-9``),
unless you've tried ``TERM`` a few times and waited a few minutes to let it
get a chance to shut down. As if you do tasks may be terminated mid-execution,
and they will not be re-run unless you have the ``acks_late`` option set.
(``Task.acks_late`` / ``CELERY_ACKS_LATE``).
How do I run celeryd in the background on [platform]?
-----------------------------------------------------
**Answer**: Please see :doc:`cookbook/daemonizing`.