jeudi 29 mai 2014

Cassandra - Datastax 4.0 erreur : ne contient pas une autorité de l'hôte : port valide: ${dse.job.tracker} - Stack Overflow


Just got a single node cluster up and running with the new datastax 4.0. Works great. We use hive to build and query our data. On the server it self. I can start hive $>dse hive and query tables just fine. When I try and use the newest Hive ODBC driver to run the same query I seeing this error. It connects just fine, i can query the keyspace and see the tables. but when i try to run the query. Looks like the map/red gets in the queue, but then errors out with the following.


Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
java.lang.IllegalArgumentException: Does not contain a valid host:port authority: ${dse.job.tracker}
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:147)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2584)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:474)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:457)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:402)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:646)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:630)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:225)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Job Submission failed with exception 'java.lang.IllegalArgumentException(Does not contain a valid host:port authority: ${dse.job.tracker})'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Any thoughts on what i should try? Thanks ahead of time for any thoughts and or suggestions/assistance you all can provide. Cheers, Eric




I have solved the issue by manually configuring host:port into mapred-site.xml configuration file.


Just add the lines


<property>
<name>mapred.job.tracker</name>
<value>host:port</value>
</property>

depending on the ip address of your hive server and the used port (usually 8012).


This will override the default placeholder $(dse.job.tracker) present in dse-mapred-default.xml configuration file.




The dse.job.tracker property needs to be set in System properties of the JVM that starts Hadoop Jobs. Hadoop will substitute the placeholder with an appropriate system property value if it is defined. Otherwise, it will be just left as is, thus the error you see.


For hive, pig and mahout the mapred.job.tracker property is set in the bin/dse script as follows:


if [ -z "$HADOOP_JT" ]; then
HADOOP_JT=`$BIN/dsetool jobtracker --use-hadoop-config`
fi
if [ -z "$HADOOP_JT" ]; then
echo "Unable to run $HADOOP_CMD: jobtracker not found"
exit 2
fi

#set the JT param as a JVM arg
export HADOOP_OPTS="$HADOOP_OPTS -Ddse.job.tracker=$HADOOP_JT"

So you should do the same for your program using the Hive ODBC driver and I guess it should be fine.


By hardcoding Hadoop JT location you make it harder to move the JT to another node, because then you'd have to update the config file manually. Moreover, the automatic JT failover of dse won't work properly if your primary JT goes down, because your program would still try to connect the old one.



Just got a single node cluster up and running with the new datastax 4.0. Works great. We use hive to build and query our data. On the server it self. I can start hive $>dse hive and query tables just fine. When I try and use the newest Hive ODBC driver to run the same query I seeing this error. It connects just fine, i can query the keyspace and see the tables. but when i try to run the query. Looks like the map/red gets in the queue, but then errors out with the following.


Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
java.lang.IllegalArgumentException: Does not contain a valid host:port authority: ${dse.job.tracker}
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:147)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2584)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:474)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:457)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:402)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:646)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:630)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:225)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Job Submission failed with exception 'java.lang.IllegalArgumentException(Does not contain a valid host:port authority: ${dse.job.tracker})'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Any thoughts on what i should try? Thanks ahead of time for any thoughts and or suggestions/assistance you all can provide. Cheers, Eric



I have solved the issue by manually configuring host:port into mapred-site.xml configuration file.


Just add the lines


<property>
<name>mapred.job.tracker</name>
<value>host:port</value>
</property>

depending on the ip address of your hive server and the used port (usually 8012).


This will override the default placeholder $(dse.job.tracker) present in dse-mapred-default.xml configuration file.



The dse.job.tracker property needs to be set in System properties of the JVM that starts Hadoop Jobs. Hadoop will substitute the placeholder with an appropriate system property value if it is defined. Otherwise, it will be just left as is, thus the error you see.


For hive, pig and mahout the mapred.job.tracker property is set in the bin/dse script as follows:


if [ -z "$HADOOP_JT" ]; then
HADOOP_JT=`$BIN/dsetool jobtracker --use-hadoop-config`
fi
if [ -z "$HADOOP_JT" ]; then
echo "Unable to run $HADOOP_CMD: jobtracker not found"
exit 2
fi

#set the JT param as a JVM arg
export HADOOP_OPTS="$HADOOP_OPTS -Ddse.job.tracker=$HADOOP_JT"

So you should do the same for your program using the Hive ODBC driver and I guess it should be fine.


By hardcoding Hadoop JT location you make it harder to move the JT to another node, because then you'd have to update the config file manually. Moreover, the automatic JT failover of dse won't work properly if your primary JT goes down, because your program would still try to connect the old one.


0 commentaires:

Enregistrer un commentaire