mercredi 21 mai 2014

Hadoop - « pas pu obtenir les fractionnements d'entrée » erreur, avec ruche-Cassandra-CqlStorageHandler - Stack Overflow


Im trying to read data from cassandra using Hive with CqlStorageHandler.


The versions:


Hive 0.11.0
Hadoop 1.2.1
Cassandra 1.2.6


Im able to create EXTERNAL table with the following HIVE Query


CREATE EXTERNAL TABLE input(number string,name string,address string) STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES ("cassandra.columns.mapping" = ":key, name, address", "cassandra.ks.name" ="cassandradb", "cassandra.host" = "localhost" ,"cassandra.port" = "9160") TBLPROPERTIES ("cassandra.input.split.size" = "64000","cassandra.range.size" = "1000","cassandra.slice.predicate.size" = "1000");


(The table "input" is already existing and containing some data in cassandra created with CQL3)


However, When I try to read data with the following query


select * from input where number="1";


Im facing the folowing issue:


Total MapReduce jobs = 1
Launching Job 1 out of 1


Number of reduce tasks is set to 0 since there's no reduce operator
java.io.IOException: Could not get input splits at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:189) at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:213) at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:169) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:292) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:297) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:144) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "143514173170822869679056708180186660043" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:185) ... 31 more Caused by: java.lang.NumberFormatException: For input string: "143514173170822869679056708180186660043" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:444) at java.lang.Long.valueOf(Long.java:540) at org.apache.cassandra.dht.Murmur3Partitioner$1.fromString(Murmur3Partitioner.java:188) at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:239) at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:207) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Job Submission failed with exception 'java.io.IOException(Could not get input splits)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask


Am I missing anything? Kindly advise.



Im trying to read data from cassandra using Hive with CqlStorageHandler.


The versions:


Hive 0.11.0
Hadoop 1.2.1
Cassandra 1.2.6


Im able to create EXTERNAL table with the following HIVE Query


CREATE EXTERNAL TABLE input(number string,name string,address string) STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES ("cassandra.columns.mapping" = ":key, name, address", "cassandra.ks.name" ="cassandradb", "cassandra.host" = "localhost" ,"cassandra.port" = "9160") TBLPROPERTIES ("cassandra.input.split.size" = "64000","cassandra.range.size" = "1000","cassandra.slice.predicate.size" = "1000");


(The table "input" is already existing and containing some data in cassandra created with CQL3)


However, When I try to read data with the following query


select * from input where number="1";


Im facing the folowing issue:


Total MapReduce jobs = 1
Launching Job 1 out of 1


Number of reduce tasks is set to 0 since there's no reduce operator
java.io.IOException: Could not get input splits at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:189) at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:213) at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:169) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:292) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:297) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:144) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "143514173170822869679056708180186660043" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:185) ... 31 more Caused by: java.lang.NumberFormatException: For input string: "143514173170822869679056708180186660043" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:444) at java.lang.Long.valueOf(Long.java:540) at org.apache.cassandra.dht.Murmur3Partitioner$1.fromString(Murmur3Partitioner.java:188) at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:239) at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:207) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Job Submission failed with exception 'java.io.IOException(Could not get input splits)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask


Am I missing anything? Kindly advise.


0 commentaires:

Enregistrer un commentaire