mercredi 9 avril 2014

Apache ruche Vs Normal Map réduire - débordement de pile


I had a strange experience while running a hive query (simple count of entries in an external table) along side a normal map reduce (word count program). My wordcount map reduce was started first, hive query started second. Hive query was some how fast and my first map reduce got stuck. Is there any case where Hive map reduce blocks all other map reduce running along side?


I request you to add in your views on this question.




I am assuming this is not consistently the case. Hive does not block any other jobs on the cluster. Cluster load and network latency can impact which job finishes first. If you are trying to compare two jobs to see which is the faster one, submit them at the same time and run the test at least 5-10 times and take the average times into consideration.




The order in which jobs complete is dependent upon the number of map and reduce tasks that are requested by the job, as well as the cluster's scheduler configuration.


If a job requests a number of reduce tasks that is greater that the cluster has available, other jobs are forced to wait until a reducer task completes. The scheduler can then assign the free reduce slot to a waiting job (again, dependent upon scheduler configuration).



I had a strange experience while running a hive query (simple count of entries in an external table) along side a normal map reduce (word count program). My wordcount map reduce was started first, hive query started second. Hive query was some how fast and my first map reduce got stuck. Is there any case where Hive map reduce blocks all other map reduce running along side?


I request you to add in your views on this question.



I am assuming this is not consistently the case. Hive does not block any other jobs on the cluster. Cluster load and network latency can impact which job finishes first. If you are trying to compare two jobs to see which is the faster one, submit them at the same time and run the test at least 5-10 times and take the average times into consideration.



The order in which jobs complete is dependent upon the number of map and reduce tasks that are requested by the job, as well as the cluster's scheduler configuration.


If a job requests a number of reduce tasks that is greater that the cluster has available, other jobs are forced to wait until a reducer task completes. The scheduler can then assign the free reduce slot to a waiting job (again, dependent upon scheduler configuration).


0 commentaires:

Enregistrer un commentaire