I'm looking for a clean way to add a column to a multiindex dataframe, where the value is only repeated once per level=0.
For example,
I want to add a column to this:
| Index level=0 | Index level=1 | Value (x) |
|---|---|---|
| A | 1 | 300 |
| 2 | 850 | |
| 3 | 2000 | |
| B | 1 | 100 |
| 2 | 70 | |
| 3 | 400 |
In order to get to this:
| Index level=0 | Index level=1 | Value (x) | Value (y) |
|---|---|---|---|
| A | 1 | 300 | Yellow |
| 2 | 850 | ||
| 3 | 2000 | ||
| B | 1 | 100 | Red |
| 2 | 70 | ||
| 3 | 400 |
I do NOT want this:
| Index level=0 | Index level=1 | Value (x) | Value (y) |
|---|---|---|---|
| A | 1 | 300 | Yellow |
| 2 | 850 | Yellow | |
| 3 | 2000 | Yellow | |
| B | 1 | 100 | Red |
| 2 | 70 | Red | |
| 3 | 400 | Red |
I'm not sure how best to create a table here that shows what I'm hoping for, but the important part to me is that y corresponds to all rows of index level=0, but is not repeated for every increment of index level=1. I'm sure that I could the additional rows in the y column with null values but I thought there might be a more elegant way.
CodePudding user response:
Use pd.IndexSlice:
idx = pd.IndexSlice
df.loc[idx[:, 1], 'Color'] = ['Yellow', 'Red']
print(df)
# Output
Value Color
A 1 300 Yellow
2 850 NaN
3 2000 NaN
B 1 100 Red
2 70 NaN
3 400 NaN
Or only with slice:
df.loc[(slice(None), 1), 'Color'] = ['Yellow', 'Red']
print(df)
# Output
Value Color
A 1 300 Yellow
2 850 NaN
3 2000 NaN
B 1 100 Red
2 70 NaN
3 400 NaN
CodePudding user response:
You haven't specified how you High / Low are determined so I'm hard coding them in a dictionary:
# Create the data frame
index = pd.MultiIndex.from_product([['A', 'B'], [1,2,3]])
df = pd.DataFrame({
'x': [300, 850, 2000, 100, 70, 400]
}, index=index)
# The transformation
def f(col):
y_values = {
'A': 'High',
'B': 'Low'
}
level0 = col.index[0][0]
return [y_values[level0]] [''] * (len(col) - 1)
df['y'] = df.groupby(level=0).transform(f)
CodePudding user response:
iterate thru the index0 and assign value to new column for every new type of data in index0:
c=0
for i in df.index:
if i[0] != c:
#put your code here to add values to new column
c = i[0]
