mercredi 13 août 2014

python - Django DB filtre d'optimisation frappe par ip et 1 heure de la différence - Stack Overflow


Hi I need get the count from visits on my DB, this visits are hits from different ips and 1 hour minimum difference, I get it right now with this code:


visits = get_visits_number(App_hit.objects.filter(app_id__campaign_id__agency_id=agency).all())

def get_visits_number(hits):
if hits:
ips = {}
date = {}
for hit in hits:
if hit.hit.remote_addr in ips:
if hit.hit.created > date[hit.hit.remote_addr] + timedelta(hours=1):
ips[hit.hit.remote_addr] = ips[hit.hit.remote_addr] + 1
date[hit.hit.remote_addr] = hit.hit.created
else:
ips[hit.hit.remote_addr] = 1
date[hit.hit.remote_addr] = hit.hit.created

total = 0
for ip in ips:
total = total + ips[ip]

return total

return 0

But if have more than 4.000 hits this takes a lot, over 30 seconds, exist any suggestion for make better and faster code?




I have tried some work around. Haven't found some significant good solutions. The point is that, regardless you query statement, your get_visits_number is basically linear. There is a certain more neat way to write the code, but no improvement to the performance.


However, what is really bothering me is that hit.hit.remote_addr. First 'hit' should be instance of App_hit, right? Then what is the second 'hit'? Another instance? A foreign key of App_hit? A potential issue you may have is that hit.hit (if it is not a typo) will cause one more database hit for retrieving data from hit.hit. You may want to use select_related in your Django query which will retrieve all the data at once.


Second possible solution, you may don't like it, as Robert Rozas said, you should try to put as much as possible data manipulation operation into query (which may not possible, though). DB is much faster than your python parser. If standardized Django ORM cannot satisfy you, then you can try raw sql query, which is hard to write and maintain. raw sql query will certainly boost your running speed, if you really care that much.



Hi I need get the count from visits on my DB, this visits are hits from different ips and 1 hour minimum difference, I get it right now with this code:


visits = get_visits_number(App_hit.objects.filter(app_id__campaign_id__agency_id=agency).all())

def get_visits_number(hits):
if hits:
ips = {}
date = {}
for hit in hits:
if hit.hit.remote_addr in ips:
if hit.hit.created > date[hit.hit.remote_addr] + timedelta(hours=1):
ips[hit.hit.remote_addr] = ips[hit.hit.remote_addr] + 1
date[hit.hit.remote_addr] = hit.hit.created
else:
ips[hit.hit.remote_addr] = 1
date[hit.hit.remote_addr] = hit.hit.created

total = 0
for ip in ips:
total = total + ips[ip]

return total

return 0

But if have more than 4.000 hits this takes a lot, over 30 seconds, exist any suggestion for make better and faster code?



I have tried some work around. Haven't found some significant good solutions. The point is that, regardless you query statement, your get_visits_number is basically linear. There is a certain more neat way to write the code, but no improvement to the performance.


However, what is really bothering me is that hit.hit.remote_addr. First 'hit' should be instance of App_hit, right? Then what is the second 'hit'? Another instance? A foreign key of App_hit? A potential issue you may have is that hit.hit (if it is not a typo) will cause one more database hit for retrieving data from hit.hit. You may want to use select_related in your Django query which will retrieve all the data at once.


Second possible solution, you may don't like it, as Robert Rozas said, you should try to put as much as possible data manipulation operation into query (which may not possible, though). DB is much faster than your python parser. If standardized Django ORM cannot satisfy you, then you can try raw sql query, which is hard to write and maintain. raw sql query will certainly boost your running speed, if you really care that much.


0 commentaires:

Enregistrer un commentaire