mercredi 13 août 2014

Django céleri mémoire non communiqué - Stack Overflow


In my django project I have the following dependencies:



In dev_settings.py:


DEBUG = False
BROKER_URL = "django://"
import djcelery
djcelery.setup_loader()
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
CELERYD_CONCURRENCY = 2
# CELERYD_TASK_TIME_LIMIT = 10

CELERYD_TASK_TIME_LIMIT is commented as suggested here http://stackoverflow.com/a/17561747/1452356 along with debug_toolbar as suggested by http://stackoverflow.com/a/19931261/1452356


I start my worker in a shell with:


./manage.py celeryd --settings=dev_settings

Then I send a task:


class ExempleTask(Task):

def run(self, piProjectId):
table = []
for i in range(50000000):
table.append(1)
return None

Using a django command:


class Command(BaseCommand):

def handle(self, *plArgs, **pdKwargs):
loResult = ExempleTask.delay(1)
loResult.get()
return None

With:


./manage.py purge_and_delete_test --settings=dev_settings

I monitor the memory usage with:


watch -n 1 'ps ax  -o rss,user,command | sort -nr | grep celery |head -n 5'

Every time I call the task, it increase the memory consumption of the celeryd/worker process, proportionally to the amount of data allocated in it...


It seems like a common issue (c.f. others stackoverflow link), however I couldn't fix it, even with the latest dependencies.


Thanks.




This is a Python and OS issue, not really a django or celery issue. Without getting too deep:


1) A process will never free memory addressing space once it has requested it from the OS. It never says "hey, I'm done here, you can have it back". In the example you've given, I'd expect the process size to grow for a while, and then stabilize, possibly at a high base line. After your example allocation, you might call the gc interface to force a garbage collect to see how


2) This isn't usually a problem, because unused pages are paged out by the OS because your process stops accessing that address space that it has deallocated.


3) It is a problem if your process is leaking object references, preventing python from garbage collecting to re-appropriate the space for later reuse by that process, and requiring your process to ask for more address space from the OS. At some point, the OS cries uncle and will (probably) kill your process with its oomkiller or similar mechanism.


4) If you are leaking, either fix the leak or set CELERYD_MAX_TASKS_PER_CHILD, and your child processes will (probably) commit suicide before upsetting the OS.


This is a good general discussion on Python's memory management: CPython memory allocation


And a few minor things: Use xrange not range - range will generate all values then iterate over that list. xrange is just a generator. Have set Django DEBUG=False?



In my django project I have the following dependencies:



In dev_settings.py:


DEBUG = False
BROKER_URL = "django://"
import djcelery
djcelery.setup_loader()
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
CELERYD_CONCURRENCY = 2
# CELERYD_TASK_TIME_LIMIT = 10

CELERYD_TASK_TIME_LIMIT is commented as suggested here http://stackoverflow.com/a/17561747/1452356 along with debug_toolbar as suggested by http://stackoverflow.com/a/19931261/1452356


I start my worker in a shell with:


./manage.py celeryd --settings=dev_settings

Then I send a task:


class ExempleTask(Task):

def run(self, piProjectId):
table = []
for i in range(50000000):
table.append(1)
return None

Using a django command:


class Command(BaseCommand):

def handle(self, *plArgs, **pdKwargs):
loResult = ExempleTask.delay(1)
loResult.get()
return None

With:


./manage.py purge_and_delete_test --settings=dev_settings

I monitor the memory usage with:


watch -n 1 'ps ax  -o rss,user,command | sort -nr | grep celery |head -n 5'

Every time I call the task, it increase the memory consumption of the celeryd/worker process, proportionally to the amount of data allocated in it...


It seems like a common issue (c.f. others stackoverflow link), however I couldn't fix it, even with the latest dependencies.


Thanks.



This is a Python and OS issue, not really a django or celery issue. Without getting too deep:


1) A process will never free memory addressing space once it has requested it from the OS. It never says "hey, I'm done here, you can have it back". In the example you've given, I'd expect the process size to grow for a while, and then stabilize, possibly at a high base line. After your example allocation, you might call the gc interface to force a garbage collect to see how


2) This isn't usually a problem, because unused pages are paged out by the OS because your process stops accessing that address space that it has deallocated.


3) It is a problem if your process is leaking object references, preventing python from garbage collecting to re-appropriate the space for later reuse by that process, and requiring your process to ask for more address space from the OS. At some point, the OS cries uncle and will (probably) kill your process with its oomkiller or similar mechanism.


4) If you are leaking, either fix the leak or set CELERYD_MAX_TASKS_PER_CHILD, and your child processes will (probably) commit suicide before upsetting the OS.


This is a good general discussion on Python's memory management: CPython memory allocation


And a few minor things: Use xrange not range - range will generate all values then iterate over that list. xrange is just a generator. Have set Django DEBUG=False?


0 commentaires:

Enregistrer un commentaire