I am running DSE 3.2.4 with analytics enabled. I am attempting to unload one of my tables into S3 for long term storage. I have created the following table in hive:
CREATE EXTERNAL TABLE events_archive (
event_id string,
time string,
type string,
source string,
value string
)
PARTITIONED BY (year string, month string, day string, hour string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://com.mydomain.events/';
I then try to use this query to load some sample data into it:
CREATE TEMPORARY FUNCTION c_to_string AS 'org.apache.hadoop.hive.cassandra.ql.udf.UDFCassandraBinaryToString';
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
INSERT OVERWRITE TABLE events_archive
PARTITION (year, month, day, hour)
SELECT c_to_string(column4, 'uuid') AS event_id,
from_unixtime(CAST(column3/1000 AS int)) AS time,
CASE column1
WHEN 'pageviews-push' THEN 'page_view'
WHEN 'score_realtime-internal' THEN 'realtime_score'
ELSE 'social_data'
END AS type,
CASE column1
WHEN 'pageviews-push' THEN 'internal'
WHEN 'score_realtime-internal' THEN 'internal'
ELSE split(column1, '-')[0]
END AS source,
value,
year(from_unixtime(CAST(column3/1000 AS int))) AS year,
month(from_unixtime(CAST(column3/1000 AS int))) AS month,
day(from_unixtime(CAST(column3/1000 AS int))) AS day,
hour(from_unixtime(CAST(column3/1000 AS int))) AS hour,
c_to_string(key2, 'blob') AS content_id
FROM events
WHERE column2 = 'data'
AND value IS NOT NULL
AND value != ''
LIMIT 10;
I end up getting this exception:
2014-02-11 20:23:55,810 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Hive Internal Error: org.apache.hadoop.fs.s3. S3Exception(org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113< /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error> )
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: < ?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10. 226.118.113</BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC< /HostId></Error>
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:156)
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy14.retrieveINode(Unknown Source)
at org.apache.hadoop.fs.s3.S3FileSystem.mkdir(S3FileSystem.java:148)
at org.apache.hadoop.fs.s3.S3FileSystem.mkdirs(S3FileSystem.java:141)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:165)
at org.apache.hadoop.hive.ql.Context.getExternalScratchDir(Context.java:222)
at org.apache.hadoop.hive.ql.Context.getExternalTmpFileURI(Context.java:315)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4049)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6205)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6136)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6762)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113< /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error>
at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:416)
at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestGet(RestS3Service.java:752)
at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1601)
at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1544)
at org.jets3t.service.S3Service.getObject(S3Service.java:2072)
at org.jets3t.service.S3Service.getObject(S3Service.java:1310)
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:144)
... 33 more
Is the Hive S3 connector supported in the latest DSE? Or what may I be doing wrong?
Try the following in your hive installation:
hive-site.xml
<property>
<name>fs.default.name</name>
<value>s3n://your-bucket</value>
</property>
core-site.xml
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>Your AWS Key</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>Your AWS Secret Key</value>
</property>
This is per the 3.1 docs: http://www.datastax.com/docs/datastax_enterprise3.1/solutions/about_hive .
Under:
Using an external file system in Hive
Didn't see it in the 3.2 docs. Not sure why they omitted it if they did, but looks like something essential for you to run Hive on S3
The Hadoop implementation of S3 file system is out of date, so writing data to S3 from hive doesn't work well. We fix the issue with reading. Now DSE can read S3 files, but writing has issue. We will check it to see whether we could fix it soon
I am running DSE 3.2.4 with analytics enabled. I am attempting to unload one of my tables into S3 for long term storage. I have created the following table in hive:
CREATE EXTERNAL TABLE events_archive (
event_id string,
time string,
type string,
source string,
value string
)
PARTITIONED BY (year string, month string, day string, hour string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://com.mydomain.events/';
I then try to use this query to load some sample data into it:
CREATE TEMPORARY FUNCTION c_to_string AS 'org.apache.hadoop.hive.cassandra.ql.udf.UDFCassandraBinaryToString';
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
INSERT OVERWRITE TABLE events_archive
PARTITION (year, month, day, hour)
SELECT c_to_string(column4, 'uuid') AS event_id,
from_unixtime(CAST(column3/1000 AS int)) AS time,
CASE column1
WHEN 'pageviews-push' THEN 'page_view'
WHEN 'score_realtime-internal' THEN 'realtime_score'
ELSE 'social_data'
END AS type,
CASE column1
WHEN 'pageviews-push' THEN 'internal'
WHEN 'score_realtime-internal' THEN 'internal'
ELSE split(column1, '-')[0]
END AS source,
value,
year(from_unixtime(CAST(column3/1000 AS int))) AS year,
month(from_unixtime(CAST(column3/1000 AS int))) AS month,
day(from_unixtime(CAST(column3/1000 AS int))) AS day,
hour(from_unixtime(CAST(column3/1000 AS int))) AS hour,
c_to_string(key2, 'blob') AS content_id
FROM events
WHERE column2 = 'data'
AND value IS NOT NULL
AND value != ''
LIMIT 10;
I end up getting this exception:
2014-02-11 20:23:55,810 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Hive Internal Error: org.apache.hadoop.fs.s3. S3Exception(org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113< /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error> )
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: < ?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10. 226.118.113</BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC< /HostId></Error>
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:156)
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy14.retrieveINode(Unknown Source)
at org.apache.hadoop.fs.s3.S3FileSystem.mkdir(S3FileSystem.java:148)
at org.apache.hadoop.fs.s3.S3FileSystem.mkdirs(S3FileSystem.java:141)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:165)
at org.apache.hadoop.hive.ql.Context.getExternalScratchDir(Context.java:222)
at org.apache.hadoop.hive.ql.Context.getExternalTmpFileURI(Context.java:315)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4049)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6205)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6136)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6762)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113< /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error>
at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:416)
at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestGet(RestS3Service.java:752)
at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1601)
at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1544)
at org.jets3t.service.S3Service.getObject(S3Service.java:2072)
at org.jets3t.service.S3Service.getObject(S3Service.java:1310)
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:144)
... 33 more
Is the Hive S3 connector supported in the latest DSE? Or what may I be doing wrong?
Try the following in your hive installation:
hive-site.xml
<property>
<name>fs.default.name</name>
<value>s3n://your-bucket</value>
</property>
core-site.xml
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>Your AWS Key</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>Your AWS Secret Key</value>
</property>
This is per the 3.1 docs: http://www.datastax.com/docs/datastax_enterprise3.1/solutions/about_hive .
Under:
Using an external file system in Hive
Didn't see it in the 3.2 docs. Not sure why they omitted it if they did, but looks like something essential for you to run Hive on S3
The Hadoop implementation of S3 file system is out of date, so writing data to S3 from hive doesn't work well. We fix the issue with reading. Now DSE can read S3 files, but writing has issue. We will check it to see whether we could fix it soon
0 commentaires:
Enregistrer un commentaire