mercredi 21 mai 2014

les services web Amazon - créer table ruche de fichier onglet séparé dans s3 mode interactif - Stack Overflow


I've loaded tab separated files into S3 that with this type of folders under the bucket: bucket --> se --> y=2013 --> m=07 --> d=14 --> h=00


each subfolder has 1 file that represent on hour of my traffic.


I then created an EMR workflow to run in interactive mode with hive.


When I log in to the master and get into hive I run this command:


CREATE EXTERNAL TABLE se (
id bigint,
oc_date timestamp)
partitioned by (y string, m string, d string, h string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bi_data';

I get this error message:



FAILED: Error in metadata: java.lang.IllegalArgumentException: The bucket name parameter must be specified when listing objects in a bucket


FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask



Can anybody help?


UPDATE Even if I try to use string fields only, I get the same error. Create table with strings:


CREATE EXTERNAL TABLE se (
id string,
oc_date string)
partitioned by (y string, m string, d string, h string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bi_data';

Hive version 0.8.1.8




So, the solution is that I had two mistakes:



  1. When writing only the bucket name you should have a trailing slash in the S3 path. reference here


  2. The underscore is also an issue, the bucket name should be DNS compliant.



Hope I helped someone with this.




DATE, DATETIME, and TIMESTAMP types aren't supported yet. Please use STRING instead. or kindly provide your hive version. Thanks



I've loaded tab separated files into S3 that with this type of folders under the bucket: bucket --> se --> y=2013 --> m=07 --> d=14 --> h=00


each subfolder has 1 file that represent on hour of my traffic.


I then created an EMR workflow to run in interactive mode with hive.


When I log in to the master and get into hive I run this command:


CREATE EXTERNAL TABLE se (
id bigint,
oc_date timestamp)
partitioned by (y string, m string, d string, h string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bi_data';

I get this error message:



FAILED: Error in metadata: java.lang.IllegalArgumentException: The bucket name parameter must be specified when listing objects in a bucket


FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask



Can anybody help?


UPDATE Even if I try to use string fields only, I get the same error. Create table with strings:


CREATE EXTERNAL TABLE se (
id string,
oc_date string)
partitioned by (y string, m string, d string, h string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bi_data';

Hive version 0.8.1.8



So, the solution is that I had two mistakes:



  1. When writing only the bucket name you should have a trailing slash in the S3 path. reference here


  2. The underscore is also an issue, the bucket name should be DNS compliant.



Hope I helped someone with this.



DATE, DATETIME, and TIMESTAMP types aren't supported yet. Please use STRING instead. or kindly provide your hive version. Thanks


0 commentaires:

Enregistrer un commentaire