samedi 12 avril 2014

hiveql - évitant Self rejoindre dans la ruche - Stack Overflow


I am using Hives built in collect_set function. The table looks like this:


 cookie, events, keywords,pages 
1234 1 'dress' 10
1234 1 'dress' 10
1235 2 'shoes' 14
1234 5 'socks' 22

using collect_set I can get the following structure


   select cookie, collect_set(events) as ev, collect_set(keywords) as kwords, 
collect_set(pages)
from table1
group by cookie

What I need to do is search the the collected arrays, multiple times, an example would be something like:


 select cookie 
,array_contains(collect_set(events),2) as has_2
,array_contains(collect_set(keywords),1) as has_4
from table1
group by cookie) A

As I understand, I am unable to project a field more than 1 time and end up having to do something like


 select a.cookie,a.has_2,b.has_4 from ( 
select cookie
,array_contains(collect_set(events),2) as has_2
from table1 group by cookie ) A
inner join
select cookie
,array_contains(collect_set(events),4) as has_4
from table1 group by cookie) B
on A.cookie = B. cookie

final result looks like:


 cookie, has_2, has_4 
1234 F F
1235 T T

Is there any way to do this without the self join? Currently I would have to self join something like 30 times to get the format I need.


Thanks




select S.cookie, array_contains(S.events_set,2), array_contains(S.events_set,4) 
from
(select cookie, collect_set(events) as events_set
from table1 group by cookie ) S



You should introduce a GROUP BY to your SQL.


e.g.


select
cookie,
array_contains(collect_set(events),2) as has_2,
array_contains(collect_set(keywords),1) as has_4
from
table1
group by
cookie;


I am using Hives built in collect_set function. The table looks like this:


 cookie, events, keywords,pages 
1234 1 'dress' 10
1234 1 'dress' 10
1235 2 'shoes' 14
1234 5 'socks' 22

using collect_set I can get the following structure


   select cookie, collect_set(events) as ev, collect_set(keywords) as kwords, 
collect_set(pages)
from table1
group by cookie

What I need to do is search the the collected arrays, multiple times, an example would be something like:


 select cookie 
,array_contains(collect_set(events),2) as has_2
,array_contains(collect_set(keywords),1) as has_4
from table1
group by cookie) A

As I understand, I am unable to project a field more than 1 time and end up having to do something like


 select a.cookie,a.has_2,b.has_4 from ( 
select cookie
,array_contains(collect_set(events),2) as has_2
from table1 group by cookie ) A
inner join
select cookie
,array_contains(collect_set(events),4) as has_4
from table1 group by cookie) B
on A.cookie = B. cookie

final result looks like:


 cookie, has_2, has_4 
1234 F F
1235 T T

Is there any way to do this without the self join? Currently I would have to self join something like 30 times to get the format I need.


Thanks



select S.cookie, array_contains(S.events_set,2), array_contains(S.events_set,4) 
from
(select cookie, collect_set(events) as events_set
from table1 group by cookie ) S


You should introduce a GROUP BY to your SQL.


e.g.


select
cookie,
array_contains(collect_set(events),2) as has_2,
array_contains(collect_set(keywords),1) as has_4
from
table1
group by
cookie;

0 commentaires:

Enregistrer un commentaire