mercredi 21 mai 2014

Hadoop - créer json de table - Stack Overflow


I have a table like:


CREATE EXTERNAL TABLE IF NOT EXISTS test_to_json
(
field1 string,
field2 string,
field3 string,
field4 string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/path';

I would like to make a json from the above table. What is the best way to do it?


an expected output:


CREATE EXTERNAL TABLE IF NOT EXISTS json_table
(
field1 string,
json_field json -- contain field2, field3, field4 in json
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/path';

either pig or hive solution would be great. i can only see examples for the opposite. (reading data from json)


thanks for the replies in advance




You can use the 'to_json' UDF contained in Brickhouse ( http://github.com/klout/brickhouse ) to create well-formed JSON output. Use that along the 'named_struct' UDF to define the schema of your outputed JSON. More info at http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/ . In your example, you would create a query of the form


CREATE EXTERNAL TABLE json_table
AS
SELECT field1, to_json( named_struct(
'field2' , field2,
'field3' , field3,
'field4' , field4 ) )
FROM test_to_json;



UDF will be better approach to achieve this.



I have a table like:


CREATE EXTERNAL TABLE IF NOT EXISTS test_to_json
(
field1 string,
field2 string,
field3 string,
field4 string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/path';

I would like to make a json from the above table. What is the best way to do it?


an expected output:


CREATE EXTERNAL TABLE IF NOT EXISTS json_table
(
field1 string,
json_field json -- contain field2, field3, field4 in json
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/path';

either pig or hive solution would be great. i can only see examples for the opposite. (reading data from json)


thanks for the replies in advance



You can use the 'to_json' UDF contained in Brickhouse ( http://github.com/klout/brickhouse ) to create well-formed JSON output. Use that along the 'named_struct' UDF to define the schema of your outputed JSON. More info at http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/ . In your example, you would create a query of the form


CREATE EXTERNAL TABLE json_table
AS
SELECT field1, to_json( named_struct(
'field2' , field2,
'field3' , field3,
'field4' , field4 ) )
FROM test_to_json;


UDF will be better approach to achieve this.


0 commentaires:

Enregistrer un commentaire