I'm exporting processed data from HDFS stored in Hive format into MySQL server using Sqoop. The code is simple and straightforward, yet no matter what I do, Sqoop doesn't recognize the fields delimiter correctly. What could be the issues?
This is my table definition in Hive
hive> show create table database.weblog_ag;
OK
CREATE TABLE database.weblog_ag(
visitor_id string,
time array<string>,
url array<string>,
client_time array<string>,
resolution array<string>,
browser array<string>,
os array<string>,
devicetype array<string>,
devicemodel array<string>,
ipinfo array<string>
CLUSTERED BY (
visitor_id)
SORTED BY (
time ASC)
INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://poc/apps/hive/warehouse/database.db/weblog_ag'
TBLPROPERTIES (
'numPartitions'='0',
'numFiles'='96',
'transient_lastDdlTime'='1390411893',
'totalSize'='59633487',
'numRows'='0',
'rawDataSize'='0')
Time taken: 1.871 seconds, Fetched: 31 row(s)
When I check the file in HDFS, the fields are correctly seperated using a \t
(tab) character. This is the sample data I grabbed from HDFS
101009a36b3113fa 2014-01-06 08:59:58 http://someurl 2014-01-06 08:56:53 1280x800 Chrome Windows XP General_Desktop Other 115.74.215.116
This is my Sqoop options file configuration
export
--connect
jdbc:mysql://webserver/fprofile_db
--username
username
--password
password
--table
weblog
--direct
--export-dir
/apps/hive/warehouse/database.db/weblog_ag
--input-fields-terminated-by
'\011'
--columns
visitor_id, time, url, client_time, resolution, browser, os, devicetype, devicemodel, ipinfo
I tried to use '\011
, \t
for the --input-fields-terminated-by
parameter but none of them work. The exported result in mySQL is as following:
What could be the problem here?
So at the end of the day, the culprit of the problem is the --direct
option. I removed it and everything works fine.
Even though you're exporting, you actually need to use
--fields-terminated-by
'\t'
I'm exporting processed data from HDFS stored in Hive format into MySQL server using Sqoop. The code is simple and straightforward, yet no matter what I do, Sqoop doesn't recognize the fields delimiter correctly. What could be the issues?
This is my table definition in Hive
hive> show create table database.weblog_ag;
OK
CREATE TABLE database.weblog_ag(
visitor_id string,
time array<string>,
url array<string>,
client_time array<string>,
resolution array<string>,
browser array<string>,
os array<string>,
devicetype array<string>,
devicemodel array<string>,
ipinfo array<string>
CLUSTERED BY (
visitor_id)
SORTED BY (
time ASC)
INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://poc/apps/hive/warehouse/database.db/weblog_ag'
TBLPROPERTIES (
'numPartitions'='0',
'numFiles'='96',
'transient_lastDdlTime'='1390411893',
'totalSize'='59633487',
'numRows'='0',
'rawDataSize'='0')
Time taken: 1.871 seconds, Fetched: 31 row(s)
When I check the file in HDFS, the fields are correctly seperated using a \t
(tab) character. This is the sample data I grabbed from HDFS
101009a36b3113fa 2014-01-06 08:59:58 http://someurl 2014-01-06 08:56:53 1280x800 Chrome Windows XP General_Desktop Other 115.74.215.116
This is my Sqoop options file configuration
export
--connect
jdbc:mysql://webserver/fprofile_db
--username
username
--password
password
--table
weblog
--direct
--export-dir
/apps/hive/warehouse/database.db/weblog_ag
--input-fields-terminated-by
'\011'
--columns
visitor_id, time, url, client_time, resolution, browser, os, devicetype, devicemodel, ipinfo
I tried to use '\011
, \t
for the --input-fields-terminated-by
parameter but none of them work. The exported result in mySQL is as following:
What could be the problem here?
So at the end of the day, the culprit of the problem is the --direct
option. I removed it and everything works fine.
Even though you're exporting, you actually need to use
--fields-terminated-by
'\t'
0 commentaires:
Enregistrer un commentaire