Home > Software design >  Create new binary column conditioned on matching text
Create new binary column conditioned on matching text

Time:01-28

I have data frame dftxt_house with one column url_text and 388 rows. I want to make a new column blocked conditional on the text in url_text. Rows in the new column blocked should take value 1 if the corresponding text in url_text contain either blocked you or You're blocked and if not value 0. The code below works for either blocked you or You're blocked but when adding both to the code using the or statement it results in value 1 on all rows in the column blocked.

Am I misunderstanding the or statement?

A look at data frame dftxt_house

dftxt_house.head()
                                                                                                                                                                                                  url_text
0  Text SIGN LIKUOL to 50409 to send this to your officials\nen)\n1 signer. Let's get to 10!\n\nAN OPEN LETTER to THE U.S. CONGRESS STARTED by JOANNE\n\nMail in Ballot being blocked by Trump\n\nHello...
1           \n\nf;\n\nMark DeSaulnier @\n\n@RepDeSaulnier\n\n@RepDeSaulnier blocked\nyou\nb ColUir- |= 0) (eel, ¢-10 macelaamiell(e\uiiate|\n\n@RepDeSaulnier and viewing\n@RepDeSaulnier’s Tweets.\n\n \n
2  JACKIE SPEIER COMMITTEE OW ARMED SERVICES\ntate Distarcr, CAuiroania SUBCOMMITTEES\n\n2465 Ravaurn House Orrice BuLOmG CHAIRWOMAN, MILiTafy PEAS INHEL\n\nWasumiaton, DC 20515-0514 , StRateaic Fonc...
3  PANGRESSMAN ERIC SWALWELL\n\nPRC <€—  VING CALIFORNIA'S 15TH CONGRESSIONAL DISTRICT\n\n \n\nFollow\n\nRep. Eric Swalwell |\n@RepSwalwell\n\n@RepSwalwell\nblocked you\nYou are blocked from followin...                

Code that should create new binray column

# match all rows with the string "blocked you" or "You're blocked". 
# a user who have been blocked by another user i.e. a politician. Create new column with 0 = no block and 1 = block
dftxt_house['blocked'] = dftxt_house.apply(
    lambda row: 1 if 'blocked you' or "You're blocked" in row['url_text'] else 0,
    axis=1
)

Current "wrong" result

dftxt_house['blocked'].value_counts()
1    388

CodePudding user response:

you are indeed misunderstanding the or statement, it should look like this

dftxt_house['blocked'] = dftxt_house.apply(
    lambda row: 1 if 'blocked you' in row['url_text'] or "You're blocked" in row['url_text'] else 0,
    axis=1
  •  Tags:  
  • Related