pandas: replace values in column with the last character in the column name-CodePudding

I have a dataframe as follows:

import pandas as pd
df = pd.DataFrame({'sent.1':[0,1,0,1],
                'sent.2':[0,1,1,0],
                'sent.3':[0,0,0,1],
                'sent.4':[1,1,0,1]
               })

I am trying to replace the non-zero values with the 5th character in the column names (which is the numeric part of the column names), so the output should be,

   sent.1  sent.2  sent.3  sent.4
0       0       0       0       4
1       1       2       0       4
2       0       2       0       0
3       1       0       3       4

I have tried the following but it does not work,

print(df.replace(1, pd.Series([i[5] for i in df.columns], [i[5] for i in df.columns])))

However when I replace it with column name, the above code works, so I am not sure which part is wrong.

print(df.replace(1, pd.Series(df.columns,  df.columns)))

CodePudding user response：

Since you're dealing with 1's and 0's, you can actually just use multiply the dataframe by a range:

df = df * range(1, df.shape[1]   1)

Output:

   sent.1  sent.2  sent.3  sent.4
0       0       0       0       4
1       1       2       0       4
2       0       2       0       0
3       1       0       3       4

Or, if you want to take the numbers from the column names:

df = df * df.columns.str.split('.').str[-1].astype(int)

CodePudding user response：

you could use string multiplication on a boolean array to place the strings based on the condition, and where to restore the zeros:

mask = df.ne(0)
(mask*df.columns.str[5]).where(mask, 0)

To have integers:

mask = df.ne(0)
(mask*df.columns.str[5].astype(int))

output:

  sent.1 sent.2 sent.3 sent.4
0      0      0      0      4
1      1      2      0      4
2      0      2      0      0
3      1      0      3      4

CodePudding user response：

And another one, working with an arbitrary condition (here s.ne(0)):

df.apply(lambda s: s.mask(s.ne(0), s.name.rpartition('.')[-1]))