mercredi 9 avril 2014

python - appliquer la hiérarchie ou index multiples à des colonnes de panda - Stack Overflow


I have seen lots of examples on how to arrange dataframe row indexes hierarchically, but I am trying to do the same for columns and am not understanding the syntax:


I am reading the contents from a csv file as follows


df=pandas.read_csv("data.csv")

and data.csv contains something like:


rno,marktheory1,marklab1,marktheory2,marklab2
1,78,45,34,54
2,23,54,87,46

so In[1]:df gives


   rno  mark1  lab1  mark2  lab2
0 1 78 45 34 54
1 2 23 54 87 46

What I would like to do is add a hierarchical index or even something akin to a tag to the columns, so that they looked something like this:


        Subject1     Subject2   
rno mark1 lab1 mark2 lab2
0 1 78 45 34 54
1 2 23 54 87 46



Here is a quick-fix solution for you:


data = pd.read_csv('data.csv')
>>> arrays = [[ '', 'Subject1', 'Subject1', 'Subject2', 'Subject2'], data.columns]
>>> df = pd.DataFrame(data.values, columns=arrays)
>>> print df
Subject1 Subject2
rno mark1 lab1 mark2 lab2
0 1 78 45 34 54
1 2 23 54 87 46

[2 rows x 5 columns]

Just another way to do the same:


>>> data = pd.read_csv('data.csv')
>>> data_pieces = [data.ix[:, [0]], data.ix[:, [1, 2]], data.ix[:, [3,4]]]
>>> data = pd.concat(data_pieces, axis=1, keys=['','Subject1', 'Subject2'])
>>> print data
Subject1 Subject2
rno mark1 lab1 mark2 lab2
0 1 78 45 34 54
1 2 23 54 87 46

[2 rows x 5 columns]


I have seen lots of examples on how to arrange dataframe row indexes hierarchically, but I am trying to do the same for columns and am not understanding the syntax:


I am reading the contents from a csv file as follows


df=pandas.read_csv("data.csv")

and data.csv contains something like:


rno,marktheory1,marklab1,marktheory2,marklab2
1,78,45,34,54
2,23,54,87,46

so In[1]:df gives


   rno  mark1  lab1  mark2  lab2
0 1 78 45 34 54
1 2 23 54 87 46

What I would like to do is add a hierarchical index or even something akin to a tag to the columns, so that they looked something like this:


        Subject1     Subject2   
rno mark1 lab1 mark2 lab2
0 1 78 45 34 54
1 2 23 54 87 46


Here is a quick-fix solution for you:


data = pd.read_csv('data.csv')
>>> arrays = [[ '', 'Subject1', 'Subject1', 'Subject2', 'Subject2'], data.columns]
>>> df = pd.DataFrame(data.values, columns=arrays)
>>> print df
Subject1 Subject2
rno mark1 lab1 mark2 lab2
0 1 78 45 34 54
1 2 23 54 87 46

[2 rows x 5 columns]

Just another way to do the same:


>>> data = pd.read_csv('data.csv')
>>> data_pieces = [data.ix[:, [0]], data.ix[:, [1, 2]], data.ix[:, [3,4]]]
>>> data = pd.concat(data_pieces, axis=1, keys=['','Subject1', 'Subject2'])
>>> print data
Subject1 Subject2
rno mark1 lab1 mark2 lab2
0 1 78 45 34 54
1 2 23 54 87 46

[2 rows x 5 columns]

0 commentaires:

Enregistrer un commentaire