mardi 12 août 2014

python - différence entre dict(groupby) et groupby - Stack Overflow


I have a list like this


[u'201003', u'200403', u'200803', u'200503', u'201303',
u'200903', u'200603', u'201203', u'200303', u'200703', u'201103']

lets call this list as 'years_list'


When I did groupby year,


group_by_yrs_list = groupby(years_list, key = lambda year_month: year_month[:-2]) 
for k,v in group_by_yrs_list:
print k, list(v)

I got the desired output:


2010 [u'201003']
2004 [u'200403']
2008 [u'200803']
2005 [u'200503']
2013 [u'201303']
2009 [u'200903']
2006 [u'200603']
2012 [u'201203']
2003 [u'200303']
2007 [u'200703']
2011 [u'201103']

Then, I slightly changed my implementation like this,


  group_by_yrs_list = dict(groupby(years_list, key = lambda year_month: year_month[:-2]))
for k,v in group_by_yrs_list.items():
print k, list(v)

I have just added a dict, but the output is different,


2003 []
2006 []
2007 []
2004 []
2005 []
2008 []
2009 []
2011 [u'201103']
2010 []
2013 []
2012 []

I couldn't find out why. Please help me to find what the dict is doing actually.


(Python 2.7)




groupby yields pairs of (key, iterator-of-group). If you are iterating the second pair, the iterator-of-group of the first pair is already consumed, so you get empty list.


Try following code:


group_by_yrs_list = {year:list(grp) for year, grp in groupby(years_list, key=lambda year_month: year_month[:-2])}
for k, v in group_by_yrs_list.items():
print k, v



The problem here is that groupby yields, in sequence, each key and a sub-iterator:


>>> for k, v in groupby(years_list, key = lambda year_month: year_month[:-2]):
... print k, v
2010 <itertools._grouper object at 0x801c68950>
2004 <itertools._grouper object at 0x801bb3a90>
2008 <itertools._grouper object at 0x801c68950>
2005 <itertools._grouper object at 0x801bb3a90>
2013 <itertools._grouper object at 0x801c68950>
2009 <itertools._grouper object at 0x801bb3a90>
2006 <itertools._grouper object at 0x801c68950>
2012 <itertools._grouper object at 0x801bb3a90>
2003 <itertools._grouper object at 0x801c68950>
2007 <itertools._grouper object at 0x801bb3a90>
2011 <itertools._grouper object at 0x801c68950>

You need to turn each <itertools._grouper object ...> into an actual list before storing it away, because the next iteration of groupby resets the iterator. If you don't, there's just the one useful iterator left, so when you print the contents of the dictionary, you get the one non-empty list (which uses up the iterator). Printing it a second time, you'll get all-empty lists.


The key is to list-ify the iterators while they are still good (I see several others beat me to example code, I prefer falsetru's variant).




According to this answer, you can do this to convert it into a dict:


group_by_yrs_list = dict((k,list(v)) for k,v in groupby(years_list, key=lambda x: x[:4]))

It's because the output of groupby is an itertools.groupby object, which is a kind of generator, which apparently can't be used directly as the argument for the dict constructor.




Try the non-streaming groupby operation from toolz


$ pip install toolz
$ ipython

In [1]: from toolz import groupby

In [2]: years_list = [u'201003', u'200403', u'200803', u'200503', u'201303',
...: u'200903', u'200603', u'201203', u'200303', u'200703', u'201103']

In [3]: get_year = lambda year_month: year_month[:-2]

In [4]: groupby(get_year, years_list)
Out[4]:
{u'2003': [u'200303'],
u'2004': [u'200403'],
u'2005': [u'200503'],
u'2006': [u'200603'],
u'2007': [u'200703'],
u'2008': [u'200803'],
u'2009': [u'200903'],
u'2010': [u'201003'],
u'2011': [u'201103'],
u'2012': [u'201203'],
u'2013': [u'201303']}


I have a list like this


[u'201003', u'200403', u'200803', u'200503', u'201303',
u'200903', u'200603', u'201203', u'200303', u'200703', u'201103']

lets call this list as 'years_list'


When I did groupby year,


group_by_yrs_list = groupby(years_list, key = lambda year_month: year_month[:-2]) 
for k,v in group_by_yrs_list:
print k, list(v)

I got the desired output:


2010 [u'201003']
2004 [u'200403']
2008 [u'200803']
2005 [u'200503']
2013 [u'201303']
2009 [u'200903']
2006 [u'200603']
2012 [u'201203']
2003 [u'200303']
2007 [u'200703']
2011 [u'201103']

Then, I slightly changed my implementation like this,


  group_by_yrs_list = dict(groupby(years_list, key = lambda year_month: year_month[:-2]))
for k,v in group_by_yrs_list.items():
print k, list(v)

I have just added a dict, but the output is different,


2003 []
2006 []
2007 []
2004 []
2005 []
2008 []
2009 []
2011 [u'201103']
2010 []
2013 []
2012 []

I couldn't find out why. Please help me to find what the dict is doing actually.


(Python 2.7)



groupby yields pairs of (key, iterator-of-group). If you are iterating the second pair, the iterator-of-group of the first pair is already consumed, so you get empty list.


Try following code:


group_by_yrs_list = {year:list(grp) for year, grp in groupby(years_list, key=lambda year_month: year_month[:-2])}
for k, v in group_by_yrs_list.items():
print k, v


The problem here is that groupby yields, in sequence, each key and a sub-iterator:


>>> for k, v in groupby(years_list, key = lambda year_month: year_month[:-2]):
... print k, v
2010 <itertools._grouper object at 0x801c68950>
2004 <itertools._grouper object at 0x801bb3a90>
2008 <itertools._grouper object at 0x801c68950>
2005 <itertools._grouper object at 0x801bb3a90>
2013 <itertools._grouper object at 0x801c68950>
2009 <itertools._grouper object at 0x801bb3a90>
2006 <itertools._grouper object at 0x801c68950>
2012 <itertools._grouper object at 0x801bb3a90>
2003 <itertools._grouper object at 0x801c68950>
2007 <itertools._grouper object at 0x801bb3a90>
2011 <itertools._grouper object at 0x801c68950>

You need to turn each <itertools._grouper object ...> into an actual list before storing it away, because the next iteration of groupby resets the iterator. If you don't, there's just the one useful iterator left, so when you print the contents of the dictionary, you get the one non-empty list (which uses up the iterator). Printing it a second time, you'll get all-empty lists.


The key is to list-ify the iterators while they are still good (I see several others beat me to example code, I prefer falsetru's variant).



According to this answer, you can do this to convert it into a dict:


group_by_yrs_list = dict((k,list(v)) for k,v in groupby(years_list, key=lambda x: x[:4]))

It's because the output of groupby is an itertools.groupby object, which is a kind of generator, which apparently can't be used directly as the argument for the dict constructor.



Try the non-streaming groupby operation from toolz


$ pip install toolz
$ ipython

In [1]: from toolz import groupby

In [2]: years_list = [u'201003', u'200403', u'200803', u'200503', u'201303',
...: u'200903', u'200603', u'201203', u'200303', u'200703', u'201103']

In [3]: get_year = lambda year_month: year_month[:-2]

In [4]: groupby(get_year, years_list)
Out[4]:
{u'2003': [u'200303'],
u'2004': [u'200403'],
u'2005': [u'200503'],
u'2006': [u'200603'],
u'2007': [u'200703'],
u'2008': [u'200803'],
u'2009': [u'200903'],
u'2010': [u'201003'],
u'2011': [u'201103'],
u'2012': [u'201203'],
u'2013': [u'201303']}

0 commentaires:

Enregistrer un commentaire