jeudi 22 mai 2014

Hadoop - créer la table joignant deux tables existantes dans la ruche Shark - Stack Overflow


I have two tables oldTable and newTable with the contents as :


oldTable :


  key    value    volume
======================
1 abc 10000
2 def 5000

newTable :


  key    value    volume
======================
1 abc 2000
2 def 3000
3 xyz 7000

I want to create a new table which sums up the volumes from bothe tables. i.e., the new table should contain the following contents :


joined_table :


  key    value    volume
======================
1 abc 12000
2 def 8000
3 xyz 7000

I tried with the following statements but to no result :


CREATE TABLE joined_table AS
SELECT key, value, volume
FROM (
SELECT IF(oldTable.key != NULL, oldTable.key, newTable.key) AS key,
IF(oldTable.value != NULL, oldTable.value, newTable.value) AS value,
IF(oldTable.volume AND newTable.volume, oldTable.volume + newTable.volume,
IF(oldTable.volume != NULL, oldTable.volume, newTable.volume)) AS volume
FROM(
SELECT oldTable.key, oldTable.value, oldTable.volume, newTable.key, newTable.value, newTable.volume
FROM newTable FULL OUTER JOIN oldTable ON newTable.key = oldTable.key
)alias
)anotherAlias;

But this throws me an error saying Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: Ambiguous column reference key.


I tried changing the column names in the joined_table in the above query, but it gives me the same error. Any help on how to achieve this ?


Also, is there any way I can overwrite the result to an existing table, say oldTable instead of creating this new one ?




The word key that you are using in your query is a reserved keyword. This might be the reason of ambiguity error being thrown by your parser. You can use back ticks to avoid the parser to read it as a reserved literal.


CREATE TABLE joined_table AS
SELECT `key`, value, volume
FROM (
SELECT IF(oldTable.`key` != NULL, oldTable.`key`, newTable.`key`) AS `key`,
IF(oldTable.value != NULL, oldTable.value, newTable.value) AS value,
IF(oldTable.volume AND newTable.volume, oldTable.volume + newTable.volume,
IF(oldTable.volume != NULL, oldTable.volume, newTable.volume)) AS volume
FROM(
SELECT oldTable.`key`, oldTable.value, oldTable.volume, newTable.`key`, newTable.value, newTable,volume
FROM newTable FULL OUTER JOIN oldTable ON newTable.`key` = oldTable.`key`;
)alias
)anotherAlias;



Ok. I managed to get this done using the following :


CREATE TABLE joined_table AS SELECT 
IF (newTable.key IS NULL, oldTable.key, newTable.key) as key,
IF (newTable.value IS NULL, oldTable.value, newTable.value) as value,
IF(newTable.volume IS NULL, oldTable.volume,
IF(oldTable.volume IS NULL, newTable.volume, oldTable.volume + newTable.volume)) as volume
FROM newTable FULL OUTER JOIN oldTable ON newTable.key = oldTable.key;

I still have to figure out how to update the existing table without creating a new one.


UPDATE


INSERT OVERWRITE TABLE oldTable SELECT ... does the updation to the existing table.



I have two tables oldTable and newTable with the contents as :


oldTable :


  key    value    volume
======================
1 abc 10000
2 def 5000

newTable :


  key    value    volume
======================
1 abc 2000
2 def 3000
3 xyz 7000

I want to create a new table which sums up the volumes from bothe tables. i.e., the new table should contain the following contents :


joined_table :


  key    value    volume
======================
1 abc 12000
2 def 8000
3 xyz 7000

I tried with the following statements but to no result :


CREATE TABLE joined_table AS
SELECT key, value, volume
FROM (
SELECT IF(oldTable.key != NULL, oldTable.key, newTable.key) AS key,
IF(oldTable.value != NULL, oldTable.value, newTable.value) AS value,
IF(oldTable.volume AND newTable.volume, oldTable.volume + newTable.volume,
IF(oldTable.volume != NULL, oldTable.volume, newTable.volume)) AS volume
FROM(
SELECT oldTable.key, oldTable.value, oldTable.volume, newTable.key, newTable.value, newTable.volume
FROM newTable FULL OUTER JOIN oldTable ON newTable.key = oldTable.key
)alias
)anotherAlias;

But this throws me an error saying Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: Ambiguous column reference key.


I tried changing the column names in the joined_table in the above query, but it gives me the same error. Any help on how to achieve this ?


Also, is there any way I can overwrite the result to an existing table, say oldTable instead of creating this new one ?



The word key that you are using in your query is a reserved keyword. This might be the reason of ambiguity error being thrown by your parser. You can use back ticks to avoid the parser to read it as a reserved literal.


CREATE TABLE joined_table AS
SELECT `key`, value, volume
FROM (
SELECT IF(oldTable.`key` != NULL, oldTable.`key`, newTable.`key`) AS `key`,
IF(oldTable.value != NULL, oldTable.value, newTable.value) AS value,
IF(oldTable.volume AND newTable.volume, oldTable.volume + newTable.volume,
IF(oldTable.volume != NULL, oldTable.volume, newTable.volume)) AS volume
FROM(
SELECT oldTable.`key`, oldTable.value, oldTable.volume, newTable.`key`, newTable.value, newTable,volume
FROM newTable FULL OUTER JOIN oldTable ON newTable.`key` = oldTable.`key`;
)alias
)anotherAlias;


Ok. I managed to get this done using the following :


CREATE TABLE joined_table AS SELECT 
IF (newTable.key IS NULL, oldTable.key, newTable.key) as key,
IF (newTable.value IS NULL, oldTable.value, newTable.value) as value,
IF(newTable.volume IS NULL, oldTable.volume,
IF(oldTable.volume IS NULL, newTable.volume, oldTable.volume + newTable.volume)) as volume
FROM newTable FULL OUTER JOIN oldTable ON newTable.key = oldTable.key;

I still have to figure out how to update the existing table without creating a new one.


UPDATE


INSERT OVERWRITE TABLE oldTable SELECT ... does the updation to the existing table.


0 commentaires:

Enregistrer un commentaire