mardi 27 mai 2014

Hadoop - création de tables dans la ruche en utilisant log files et HUE - Stack Overflow


So I'm trying to import my log files into a hadoop cluster using Hive throught the HUE web interface . The format of the log files is


"/log/apache/apache91" "10.93.123.135" "8081" "12.93.145.7" "12.93.123.7" "/index.html" ""  "114" "111211" "21111" "200" "200" "[14/Mar/2013:23:00:15 -0400]" "-" "-" "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)" "-" "-" "-" "-" 

So i tried using the automatic table creation in HUE using a quotation as a delimiter. But this give me a null column for every second column. I understand why this happens because of the delimiter. Is there a way to import the data without the null column's or can I delete the null column or can I create a new table from the existing table and extract the only the data I want.


I have a lot of data to import . If anyone has a better solution for me , I would be open to it.




Hive only support one character as separator, so indeed you would need to have a single field separator or a TSV/CSV format.


Maybe you can configure the separator of the logger (switch to TAB or comma instead of space) and you won't need the preprocessing step.



So I'm trying to import my log files into a hadoop cluster using Hive throught the HUE web interface . The format of the log files is


"/log/apache/apache91" "10.93.123.135" "8081" "12.93.145.7" "12.93.123.7" "/index.html" ""  "114" "111211" "21111" "200" "200" "[14/Mar/2013:23:00:15 -0400]" "-" "-" "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)" "-" "-" "-" "-" 

So i tried using the automatic table creation in HUE using a quotation as a delimiter. But this give me a null column for every second column. I understand why this happens because of the delimiter. Is there a way to import the data without the null column's or can I delete the null column or can I create a new table from the existing table and extract the only the data I want.


I have a lot of data to import . If anyone has a better solution for me , I would be open to it.



Hive only support one character as separator, so indeed you would need to have a single field separator or a TSV/CSV format.


Maybe you can configure the separator of the logger (switch to TAB or comma instead of space) and you won't need the preprocessing step.


0 commentaires:

Enregistrer un commentaire