mardi 12 août 2014

python - distribuer une fonctionnalité de DataFrame pandas au hasard - Stack Overflow


I am reading in a set of data using pandas and plotting this using matplotlib. One column is a "category", eg "Sports", "Entertainment", but for some rows this is marked "Random", which means I need to distribute this value and add it randomly to one column. Ideally I would like to do this in the dataframe so that all values would be distributed.


My basic graph code is as follows :


df.category.value_counts().plot(kind="barh", alpha=a_bar)
title("Category Distribution")

The behaviour I would like is


If category == "Random"{
Assign this value to another column at random.
}

How can I accomplish this?




possibly:


# take the original value_counts, drop 'Random'
ts1 = df.category.value_counts()
rand_cnt = ts1.random
ts1.drop('Random', inplace=True)

# randomly choose from the other categories
ts2 = pd.Series(np.random.choice(ts1.index, rand_cnt)).value_counts()

# align the two series, and add them up
ts2 = ts2.reindex_like(ts1).fillna(0)
(ts1 + ts2).plot(kind='barh')

if you want to modify the original data-frame, then


idx = df.category == 'Random'
xs = df.category[~idx].unique() # all other categories

# randomly assign to categories which are 'Random'
df.category[idx] = np.random.choice(xs, idx.sum())


I am reading in a set of data using pandas and plotting this using matplotlib. One column is a "category", eg "Sports", "Entertainment", but for some rows this is marked "Random", which means I need to distribute this value and add it randomly to one column. Ideally I would like to do this in the dataframe so that all values would be distributed.


My basic graph code is as follows :


df.category.value_counts().plot(kind="barh", alpha=a_bar)
title("Category Distribution")

The behaviour I would like is


If category == "Random"{
Assign this value to another column at random.
}

How can I accomplish this?



possibly:


# take the original value_counts, drop 'Random'
ts1 = df.category.value_counts()
rand_cnt = ts1.random
ts1.drop('Random', inplace=True)

# randomly choose from the other categories
ts2 = pd.Series(np.random.choice(ts1.index, rand_cnt)).value_counts()

# align the two series, and add them up
ts2 = ts2.reindex_like(ts1).fillna(0)
(ts1 + ts2).plot(kind='barh')

if you want to modify the original data-frame, then


idx = df.category == 'Random'
xs = df.category[~idx].unique() # all other categories

# randomly assign to categories which are 'Random'
df.category[idx] = np.random.choice(xs, idx.sum())

0 commentaires:

Enregistrer un commentaire