mercredi 21 mai 2014

Hadoop - "Connection Refused" lors de l'exécution d'une ruche-Query dans un script Python avec Thrift - Stack Overflow


All,


I am trying to run a hive query within a python script using Thrift library for Python. I am able to run queries that dont execute M/R like create table, and select * from table etc. But when i execute query that execute M/R job (like select * from table where...), I get the following exception.


starting hive server...

Hive history file=/tmp/root/hive_job_log_root_201212171354_275968533.txt
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
java.net.ConnectException: Call to sp-rhel6-01/172.22.193.79:54311 failed on connection exception: java.net.ConnectException: Connection refused

Job Submission failed with exception 'java.net.ConnectException(Call to sp-rhel6-01/172.22.193.79:54311 failed on connection exception: java.net.ConnectException: Connection refused)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask



I have a multi node hadoop cluster, and my hive is installed in the namenode, I am running the python script on the same namenode too.


The python script is


from hive_service import ThriftHive
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

transport = TSocket.TSocket('172.22.193.79', 10000)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)

client = ThriftHive.Client(protocol)
transport.open()

client.execute("select count(*) from example ")
print client.fetchAll();
transport.close()

Can anyone help me understand what is wrong?


-Sushant




I was having trouble completing SELECT queries, though I could complete SHOW and DESCRIBE queries. The way I fixed this was by restarting the services on my cluster. I am using Cloudera to manage my cluster, so the command I ran was $ sudo /etc/init.d/cloudera-scm-agent hard_restart. I did not spend too much time debugging, but I am guessing the NN or JT crashed. The interesting part is that I could still complete queries on the metadata. My best guess is that the queries went straight to the metastore and did not have to touch the HDFS. I need someone to confirm that though.



All,


I am trying to run a hive query within a python script using Thrift library for Python. I am able to run queries that dont execute M/R like create table, and select * from table etc. But when i execute query that execute M/R job (like select * from table where...), I get the following exception.


starting hive server...

Hive history file=/tmp/root/hive_job_log_root_201212171354_275968533.txt
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
java.net.ConnectException: Call to sp-rhel6-01/172.22.193.79:54311 failed on connection exception: java.net.ConnectException: Connection refused

Job Submission failed with exception 'java.net.ConnectException(Call to sp-rhel6-01/172.22.193.79:54311 failed on connection exception: java.net.ConnectException: Connection refused)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask



I have a multi node hadoop cluster, and my hive is installed in the namenode, I am running the python script on the same namenode too.


The python script is


from hive_service import ThriftHive
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

transport = TSocket.TSocket('172.22.193.79', 10000)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)

client = ThriftHive.Client(protocol)
transport.open()

client.execute("select count(*) from example ")
print client.fetchAll();
transport.close()

Can anyone help me understand what is wrong?


-Sushant



I was having trouble completing SELECT queries, though I could complete SHOW and DESCRIBE queries. The way I fixed this was by restarting the services on my cluster. I am using Cloudera to manage my cluster, so the command I ran was $ sudo /etc/init.d/cloudera-scm-agent hard_restart. I did not spend too much time debugging, but I am guessing the NN or JT crashed. The interesting part is that I could still complete queries on the metadata. My best guess is that the queries went straight to the metastore and did not have to touch the HDFS. I need someone to confirm that though.


0 commentaires:

Enregistrer un commentaire