Hi I need get the count from visits
on my DB, this visits
are hits from different ips and 1 hour minimum difference, I get it right now with this code:
visits = get_visits_number(App_hit.objects.filter(app_id__campaign_id__agency_id=agency).all())
def get_visits_number(hits):
if hits:
ips = {}
date = {}
for hit in hits:
if hit.hit.remote_addr in ips:
if hit.hit.created > date[hit.hit.remote_addr] + timedelta(hours=1):
ips[hit.hit.remote_addr] = ips[hit.hit.remote_addr] + 1
date[hit.hit.remote_addr] = hit.hit.created
else:
ips[hit.hit.remote_addr] = 1
date[hit.hit.remote_addr] = hit.hit.created
total = 0
for ip in ips:
total = total + ips[ip]
return total
return 0
But if have more than 4.000 hits this takes a lot, over 30 seconds, exist any suggestion for make better and faster code?
I have tried some work around. Haven't found some significant good solutions. The point is that, regardless you query statement, your get_visits_number is basically linear. There is a certain more neat way to write the code, but no improvement to the performance.
However, what is really bothering me is that hit.hit.remote_addr. First 'hit' should be instance of App_hit, right? Then what is the second 'hit'? Another instance? A foreign key of App_hit? A potential issue you may have is that hit.hit (if it is not a typo) will cause one more database hit for retrieving data from hit.hit. You may want to use select_related in your Django query which will retrieve all the data at once.
Second possible solution, you may don't like it, as Robert Rozas said, you should try to put as much as possible data manipulation operation into query (which may not possible, though). DB is much faster than your python parser. If standardized Django ORM cannot satisfy you, then you can try raw sql query, which is hard to write and maintain. raw sql query will certainly boost your running speed, if you really care that much.
Hi I need get the count from visits
on my DB, this visits
are hits from different ips and 1 hour minimum difference, I get it right now with this code:
visits = get_visits_number(App_hit.objects.filter(app_id__campaign_id__agency_id=agency).all())
def get_visits_number(hits):
if hits:
ips = {}
date = {}
for hit in hits:
if hit.hit.remote_addr in ips:
if hit.hit.created > date[hit.hit.remote_addr] + timedelta(hours=1):
ips[hit.hit.remote_addr] = ips[hit.hit.remote_addr] + 1
date[hit.hit.remote_addr] = hit.hit.created
else:
ips[hit.hit.remote_addr] = 1
date[hit.hit.remote_addr] = hit.hit.created
total = 0
for ip in ips:
total = total + ips[ip]
return total
return 0
But if have more than 4.000 hits this takes a lot, over 30 seconds, exist any suggestion for make better and faster code?
I have tried some work around. Haven't found some significant good solutions. The point is that, regardless you query statement, your get_visits_number is basically linear. There is a certain more neat way to write the code, but no improvement to the performance.
However, what is really bothering me is that hit.hit.remote_addr. First 'hit' should be instance of App_hit, right? Then what is the second 'hit'? Another instance? A foreign key of App_hit? A potential issue you may have is that hit.hit (if it is not a typo) will cause one more database hit for retrieving data from hit.hit. You may want to use select_related in your Django query which will retrieve all the data at once.
Second possible solution, you may don't like it, as Robert Rozas said, you should try to put as much as possible data manipulation operation into query (which may not possible, though). DB is much faster than your python parser. If standardized Django ORM cannot satisfy you, then you can try raw sql query, which is hard to write and maintain. raw sql query will certainly boost your running speed, if you really care that much.
0 commentaires:
Enregistrer un commentaire