Home > database >  Creating new variable based on substring in another variable in Python
Creating new variable based on substring in another variable in Python

Time:02-03

I am trying to create a binary (yes/no) variable based on what is in a particular text string (in Python).

The data looks something like:

Person ID Test Result
87 No exercise induced ischaemia
88 Treadmill test induced increased BP
89 NORMAL test on treadmill

and so on.

I need to pick out all the people who have "No exercise induced ischaemia". Can anybody shed some light on how to do this, given I have about 20 columns in the real data set and about 14000 rows that need to be searched.

Here's an example dataframe for convenience

d = {'ID': [87, 88, 89, 90, 91, 92], 'TestResult': ["No exercise induced ischaemia", "NORMAL test on treadmill",  "No exercise induced ischaemia", "treadmill induced ischaemia", "NORMAL test on treadmill", "No exercise induced ischaemia"]}
df = pd.DataFrame(data=d)

I've tried things like

df['NegTest'] = df[df.TestResult.str.contains('No exercise induced ischaemia',case=True)]

with no luck.

Thanks for any help!

CodePudding user response:

You're very close. Just use np.where to actually generate the yes/no:

df['NegTest'] = np.where(df.TestResult.str.contains('No exercise induced ischaemia', case=True), 'yes', 'no')

Output:

>>> df
   ID                     TestResult NegTest
0  87  No exercise induced ischaemia     yes
1  88       NORMAL test on treadmill      no
2  89  No exercise induced ischaemia     yes
3  90    treadmill induced ischaemia      no
4  91       NORMAL test on treadmill      no
5  92  No exercise induced ischaemia     yes

If you want it to just be True/False, you can even skip np.where:

df['NegTest'] = df.TestResult.str.contains('No exercise induced ischaemia', case=True)

Output:

>>> df
   ID                     TestResult  NegTest
0  87  No exercise induced ischaemia     True
1  88       NORMAL test on treadmill    False
2  89  No exercise induced ischaemia     True
3  90    treadmill induced ischaemia    False
4  91       NORMAL test on treadmill    False
5  92  No exercise induced ischaemia     True
  •  Tags:  
  • Related