I am still a newbie with python. I have a problem like this:
I would like to substitute the value of the first and the last row with 0. Here's my data (stored in Dataframe):
| Category | Value |
|---|---|
| 2000-1 | 500 |
| 2000-1 | 200 |
| 2000-1 | 20 |
| 2000-1 | 350 |
| 2000-2 | 250 |
| 2000-2 | 210 |
| 2000-2 | 320 |
| 3000-1 | 330 |
| 3000-1 | 370 |
| 3000-1 | 85 |
| 4000-3 | 91 |
| 4000-3 | 92 |
| 4000-3 | 88 |
| 4000-5 | 87 |
| 4000-5 | 77 |
| 4000-5 | 76 |
And here's the expected output:
| Category | Value |
|---|---|
| 2000-1 | 0 |
| 2000-1 | 200 |
| 2000-1 | 20 |
| 2000-1 | 0 |
| 2000-2 | 0 |
| 2000-2 | 210 |
| 2000-2 | 320 |
| 3000-1 | 0 |
| 3000-1 | 370 |
| 3000-1 | 0 |
| 4000-3 | 0 |
| 4000-3 | 92 |
| 4000-3 | 0 |
| 4000-5 | 0 |
| 4000-5 | 77 |
| 4000-5 | 0 |
Thank you, need your help
CodePudding user response:
Via transform():
def first_last_0(x):
x.iloc[0] = 0
x.iloc[-1] = 0
return x
df.groupby('Category')['Value'].transform(first_last_0)
0 0
1 200
2 20
3 0
4 0
5 210
6 0
7 0
8 370
9 0
10 0
11 92
12 0
13 0
14 77
15 0
CodePudding user response:
If the values in "Value" are unique for each "Category" (like in the example), we could:
(i) groupby "Category" and select column "Value"
(ii) Find the first and last elements of each group
(iii) Create a boolean mask that identifies first and last elements of each group
(iv) use the mask on df['Value'] and assign 0.
gb_obj = df.groupby('Category')['Value']
firsts = gb_obj.transform('first')
lasts = gb_obj.transform('last')
msk = (df['Value'] == firsts) | (df['Value'] == lasts)
df.loc[msk, 'Value'] = 0
If not, we can apply a function to "Value" that assigns 0 to first and last element in each group:
def first_last_to_zero(g):
g.iloc[0] = g.iloc[-1] = 0
return g
df['Value'] = df.groupby('Category')['Value'].apply(first_last_to_zero)
Output:
Category Value
0 2000-1 0
1 2000-1 200
2 2000-1 20
3 2000-1 0
4 2000-2 0
5 2000-2 210
6 2000-2 0
7 3000-1 0
8 3000-1 370
9 3000-1 0
10 4000-3 0
11 4000-3 92
12 4000-3 0
13 4000-5 0
14 4000-5 77
15 4000-5 0
