I have a list of strings, say:
my_list=['meeting held in room 213','room 425 is occupied']
I'd like to merge bigrams✶ where 'room' followed by numbers as a single word. Expected output:
expected_list=['meeting held in room213','room425 is occupied']
I know this is simple but I just can't figure out how to achieve that.
✶ A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words.
CodePudding user response:
Should work in case there's excessive space between the room and a number and room is not followed by a number.
result = []
for elem in my_list:
rest, room, number = elem.rpartition('room')
if number.strip() and number.split()[0].isdecimal():
result.append(rest room number.lstrip())
else:
result.append(elem)
CodePudding user response:
You could use str.replace:
out = [s.replace('room ', 'room') for s in my_list]
or using re.sub:
import re
out = [re.sub(r'(room) (\d )', r'\1\2', s) for s in my_list]
Output:
['meeting held in room213', 'room425 is occupied']
