Hadoop - ne peut pas exporter des données de ruche dans MySQL à l'aide de Sqoop

I'm exporting processed data from HDFS stored in Hive format into MySQL server using Sqoop. The code is simple and straightforward, yet no matter what I do, Sqoop doesn't recognize the fields delimiter correctly. What could be the issues?

This is my table definition in Hive

hive> show create table database.weblog_ag;

OK
CREATE  TABLE database.weblog_ag(
  visitor_id string,
  time array<string>,
  url array<string>,
  client_time array<string>,
  resolution array<string>,
  browser array<string>,
  os array<string>,
  devicetype array<string>,
  devicemodel array<string>,
  ipinfo array<string>
CLUSTERED BY (
  visitor_id)
SORTED BY (
  time ASC)
INTO 32 BUCKETS
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://poc/apps/hive/warehouse/database.db/weblog_ag'
TBLPROPERTIES (
  'numPartitions'='0',
  'numFiles'='96',
  'transient_lastDdlTime'='1390411893',
  'totalSize'='59633487',
  'numRows'='0',
  'rawDataSize'='0')
Time taken: 1.871 seconds, Fetched: 31 row(s)

When I check the file in HDFS, the fields are correctly seperated using a \t (tab) character. This is the sample data I grabbed from HDFS

101009a36b3113fa        2014-01-06 08:59:58     http://someurl    2014-01-06 08:56:53     1280x800        Chrome  Windows XP      General_Desktop Other   115.74.215.116

This is my Sqoop options file configuration

export

--connect
jdbc:mysql://webserver/fprofile_db

--username
username

--password
password

--table
weblog

--direct

--export-dir
/apps/hive/warehouse/database.db/weblog_ag

--input-fields-terminated-by
'\011'

--columns
visitor_id, time, url, client_time, resolution, browser, os, devicetype, devicemodel, ipinfo

I tried to use '\011, \t for the --input-fields-terminated-by parameter but none of them work. The exported result in mySQL is as following:

enter image description here What could be the problem here?

So at the end of the day, the culprit of the problem is the --direct option. I removed it and everything works fine.

Even though you're exporting, you actually need to use

--fields-terminated-by
'\t'

This is my table definition in Hive

hive> show create table database.weblog_ag;

OK
CREATE  TABLE database.weblog_ag(
  visitor_id string,
  time array<string>,
  url array<string>,
  client_time array<string>,
  resolution array<string>,
  browser array<string>,
  os array<string>,
  devicetype array<string>,
  devicemodel array<string>,
  ipinfo array<string>
CLUSTERED BY (
  visitor_id)
SORTED BY (
  time ASC)
INTO 32 BUCKETS
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://poc/apps/hive/warehouse/database.db/weblog_ag'
TBLPROPERTIES (
  'numPartitions'='0',
  'numFiles'='96',
  'transient_lastDdlTime'='1390411893',
  'totalSize'='59633487',
  'numRows'='0',
  'rawDataSize'='0')
Time taken: 1.871 seconds, Fetched: 31 row(s)

When I check the file in HDFS, the fields are correctly seperated using a \t (tab) character. This is the sample data I grabbed from HDFS

101009a36b3113fa        2014-01-06 08:59:58     http://someurl    2014-01-06 08:56:53     1280x800        Chrome  Windows XP      General_Desktop Other   115.74.215.116

This is my Sqoop options file configuration

export

--connect
jdbc:mysql://webserver/fprofile_db

--username
username

--password
password

--table
weblog

--direct

--export-dir
/apps/hive/warehouse/database.db/weblog_ag

--input-fields-terminated-by
'\011'

--columns
visitor_id, time, url, client_time, resolution, browser, os, devicetype, devicemodel, ipinfo

I tried to use '\011, \t for the --input-fields-terminated-by parameter but none of them work. The exported result in mySQL is as following:

enter image description here What could be the problem here?

So at the end of the day, the culprit of the problem is the --direct option. I removed it and everything works fine.

Even though you're exporting, you actually need to use

--fields-terminated-by
'\t'

Source

Stackoverflow Blog

mardi 22 avril 2014

Hadoop - ne peut pas exporter des données de ruche dans MySQL à l'aide de Sqoop - Stack Overflow

0 commentaires:

Enregistrer un commentaire

Popular Posts