python - ajouter des colonnes d'un fichier csv à listes - Stack Overflow ~ Stackoverflow Blog

I have a dataset which is formated in a tab separated file. What i want to accomplish, is to append some of the columns of that file, to different lists.

The file i am reading is somewhat like this:

   temperature  station.id  latitude    longtitude  sea.distance    altitude

1               S7          0            4          0               75
2               S8          1            5          3               400
3               S8          1.5          2          4               80

Notice that the first column is the index value, with no header, while the second column temperature has no values.

Now i am using a csv.reader(infile, delimiter="\t") to read the file and append to create a columns list. Which as proven, is utterly wrong.

columns = []

for column in csv.reader(infile, delimiter="\t"):
            columns.append(column)

I have searched a bit and i found several functions and ways that might (or might not) do the trick, but i am not sure as to which one i should use. Any suggestions? Thanks in advance

Edit: Result should be like this i think:

lat = [0,1,1.5]

A list for latitude values

Code so far:

#!/usr/bin/env Python

import csv

columns = []

with open("/path/to/file/file.txt") as infile:    

    for row in csv.reader(infile, delimiter="\t"):
        columns.append(row[1])
        print columns

Edit2: print row gives this:

['', 'temperature', 'station.id', 'latitude', 'longtitude', 'sea.distance', 'altitude']
[]
['1', '', '', '', 'S7', '0', '', '4', '', '0', '', '75']
['2', '', '', '', 'S8', '1', '', '5', '', '3', '', '400']
['3', '', '', '', 'S8', '1.5', '', '2', '', '4', '', '80']

Try the following:

>>> with open("test.csv", "rb") as f:
...     latitudes = [x[5] for x in csv.reader(f, delimiter="\t") if x]
...     
... 
>>> latitudes
['0', '1', '1.5']

csv.reader iterates over the rows of your csv-file. The code grabs every sixth item (remember, indexing begins at 0) from each row if the row exists (or does not evaluate to False, e.g. empty list). It does so using a list comprehension. You could write that list comprehension as a regular for loop:

>>> for row in csv.reader(f, delimiter="\t"):
...    if row:
...        latitudes.append(row[5])
...
...

EDIT: Your example data seems to have a bunch of extra tabs. I've updated the answer to take this in account. You should fix your input file though, unless you want to run into more problems.

If you sanitize your input file, you could turn the data into a pandas.DataFrame. This allows easy manipulation and access of the csv data. Here's an example:

>>> data = pandas.DataFrame.from_csv("/tmp/test.csv", sep="\t")
>>> print data
index  temperature station.id  latitude  longtitude  sea.distance   altitude

NaN            NaN        NaN       NaN         NaN            NaN       NaN
 1             NaN         S7       0.0           4              0        75
 2             NaN         S8       1.0           5              3       400
 3             NaN         S8       1.5           2              4        80

[4 rows x 6 columns]

>>> data['latitude']
index
NaN      NaN
 1       0.0
 2       1.0
 3       1.5
Name: latitude, dtype: float64
>>>

columns = []
for row in csv.reader(infile, delimiter="\t"):
    columns.append(row[1])   # here row[1] is the second column

I have a dataset which is formated in a tab separated file. What i want to accomplish, is to append some of the columns of that file, to different lists.

The file i am reading is somewhat like this:

   temperature  station.id  latitude    longtitude  sea.distance    altitude

1               S7          0            4          0               75
2               S8          1            5          3               400
3               S8          1.5          2          4               80

Notice that the first column is the index value, with no header, while the second column temperature has no values.

Now i am using a csv.reader(infile, delimiter="\t") to read the file and append to create a columns list. Which as proven, is utterly wrong.

columns = []

for column in csv.reader(infile, delimiter="\t"):
            columns.append(column)

I have searched a bit and i found several functions and ways that might (or might not) do the trick, but i am not sure as to which one i should use. Any suggestions? Thanks in advance

Edit: Result should be like this i think:

lat = [0,1,1.5]

A list for latitude values

Code so far:

#!/usr/bin/env Python

import csv

columns = []

with open("/path/to/file/file.txt") as infile:    

    for row in csv.reader(infile, delimiter="\t"):
        columns.append(row[1])
        print columns

Edit2: print row gives this:

['', 'temperature', 'station.id', 'latitude', 'longtitude', 'sea.distance', 'altitude']
[]
['1', '', '', '', 'S7', '0', '', '4', '', '0', '', '75']
['2', '', '', '', 'S8', '1', '', '5', '', '3', '', '400']
['3', '', '', '', 'S8', '1.5', '', '2', '', '4', '', '80']

Try the following:

>>> with open("test.csv", "rb") as f:
...     latitudes = [x[5] for x in csv.reader(f, delimiter="\t") if x]
...     
... 
>>> latitudes
['0', '1', '1.5']

>>> for row in csv.reader(f, delimiter="\t"):
...    if row:
...        latitudes.append(row[5])
...
...

EDIT: Your example data seems to have a bunch of extra tabs. I've updated the answer to take this in account. You should fix your input file though, unless you want to run into more problems.

If you sanitize your input file, you could turn the data into a pandas.DataFrame. This allows easy manipulation and access of the csv data. Here's an example:

>>> data = pandas.DataFrame.from_csv("/tmp/test.csv", sep="\t")
>>> print data
index  temperature station.id  latitude  longtitude  sea.distance   altitude

NaN            NaN        NaN       NaN         NaN            NaN       NaN
 1             NaN         S7       0.0           4              0        75
 2             NaN         S8       1.0           5              3       400
 3             NaN         S8       1.5           2              4        80

[4 rows x 6 columns]

>>> data['latitude']
index
NaN      NaN
 1       0.0
 2       1.0
 3       1.5
Name: latitude, dtype: float64
>>>

columns = []
for row in csv.reader(infile, delimiter="\t"):
    columns.append(row[1])   # here row[1] is the second column

Source

Stackoverflow Blog

mardi 8 avril 2014

python - ajouter des colonnes d'un fichier csv à listes - Stack Overflow

0 commentaires:

Enregistrer un commentaire

Popular Posts