lundi 19 mai 2014

Hadoop - copier des données d'une hbase table à l'autre - Stack Overflow


I have created one table hivetest which also create the table in hbase with name of 'hbasetest'. Now I want to copy 'hbasetest' data into another hbase table(say logdata) with the same schema. So, can anyone help me how do copy the data from 'hbasetest' to 'logdata' without using the hive.


CREATE TABLE hivetest(cookie string, timespent string, pageviews string, visit string, logdate string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "m:timespent, m:pageviews, m:visit, m:logdate")
TBLPROPERTIES ("hbase.table.name" = "hbasetest");

Updated question :


I have created the table logdata like this. But, I am getting the following error.


create 'logdata', {NAME => ' m', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>'0', TTL => '2147483647', BLOCKSIZE=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

13/09/23 12:57:19 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_0, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020,
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12:57:29 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_1, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020,
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12:57:38 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_2, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020,
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12:57:53 INFO mapred.JobClient: Job complete: job_201309231115_0025
13/09/23 12:57:53 INFO mapred.JobClient: Counters: 7
13/09/23 12:57:53 INFO mapred.JobClient: Job Counters
13/09/23 12:57:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=34605
13/09/23 12:57:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/23 12:57:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/09/23 12:57:53 INFO mapred.JobClient: Rack-local map tasks=4
13/09/23 12:57:53 INFO mapred.JobClient: Launched map tasks=4
13/09/23 12:57:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/09/23 12:57:53 INFO mapred.JobClient: Failed map tasks=1



Actually i am using hive-0.9.0. Which has a bug


https://issues.apache.org/jira/browse/HIVE-3243.

So, while creating the table SerDe of HBaseStorageHandler doesn't ignore white space between comma and column family. Hence you need to remove the white spaces. Then it will work fine.




Use the copyTable command. Example :


$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=logdata hbasetest


I have created one table hivetest which also create the table in hbase with name of 'hbasetest'. Now I want to copy 'hbasetest' data into another hbase table(say logdata) with the same schema. So, can anyone help me how do copy the data from 'hbasetest' to 'logdata' without using the hive.


CREATE TABLE hivetest(cookie string, timespent string, pageviews string, visit string, logdate string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "m:timespent, m:pageviews, m:visit, m:logdate")
TBLPROPERTIES ("hbase.table.name" = "hbasetest");

Updated question :


I have created the table logdata like this. But, I am getting the following error.


create 'logdata', {NAME => ' m', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>'0', TTL => '2147483647', BLOCKSIZE=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

13/09/23 12:57:19 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_0, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020,
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12:57:29 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_1, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020,
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12:57:38 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_2, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020,
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12:57:53 INFO mapred.JobClient: Job complete: job_201309231115_0025
13/09/23 12:57:53 INFO mapred.JobClient: Counters: 7
13/09/23 12:57:53 INFO mapred.JobClient: Job Counters
13/09/23 12:57:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=34605
13/09/23 12:57:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/23 12:57:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/09/23 12:57:53 INFO mapred.JobClient: Rack-local map tasks=4
13/09/23 12:57:53 INFO mapred.JobClient: Launched map tasks=4
13/09/23 12:57:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/09/23 12:57:53 INFO mapred.JobClient: Failed map tasks=1


Actually i am using hive-0.9.0. Which has a bug


https://issues.apache.org/jira/browse/HIVE-3243.

So, while creating the table SerDe of HBaseStorageHandler doesn't ignore white space between comma and column family. Hence you need to remove the white spaces. Then it will work fine.



Use the copyTable command. Example :


$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=logdata hbasetest

0 commentaires:

Enregistrer un commentaire