I have dataframe with many lines and columns, looking like this :
| index | col1 | col2 |
|---|---|---|
| 1 | 0 | 1 |
| 2 | 5 | 1 |
| 3 | 5 | 4 |
| 4 | 5 | 4 |
| 5 | 3 | 4 |
| 6 | 2 | 4 |
| 7 | 2 | 1 |
| 8 | 2 | 2 |
I would like to keep only the values that are different from the previous index and replace the others by 0. On the example dataframe, it would be :
| index | col1 | col2 |
|---|---|---|
| 1 | 0 | 1 |
| 2 | 5 | 0 |
| 3 | 0 | 4 |
| 4 | 0 | 0 |
| 5 | 3 | 0 |
| 6 | 2 | 0 |
| 7 | 0 | 1 |
| 8 | 0 | 2 |
What is a solution that works for any number of row/columns ?
CodePudding user response:
So you'd like to keep the values where the difference to previous row is not equal to 0 (i.e., they're not the same), and put 0 to other places:
>>> df.where(df.diff().ne(0), other=0)
col1 col2
index
1 0 1
2 5 0
3 0 4
4 0 0
5 3 0
6 2 0
7 0 1
8 0 2
