Home > Back-end >  How to merge bigrams that fit certain pattern as a single word in text?
How to merge bigrams that fit certain pattern as a single word in text?

Time:02-07

I have a list of strings, say:

my_list=['meeting held in room 213','room 425 is occupied']

I'd like to merge bigrams where 'room' followed by numbers as a single word. Expected output:

expected_list=['meeting held in room213','room425 is occupied']

I know this is simple but I just can't figure out how to achieve that.

A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words.

CodePudding user response:

Should work in case there's excessive space between the room and a number and room is not followed by a number.

result = []
for elem in my_list:
    rest, room, number = elem.rpartition('room')
    if number.strip() and number.split()[0].isdecimal():
        result.append(rest   room   number.lstrip())
    else:
        result.append(elem)

CodePudding user response:

You could use str.replace:

out = [s.replace('room ', 'room') for s in my_list]

or using re.sub:

import re
out = [re.sub(r'(room) (\d )', r'\1\2', s) for s in my_list]

Output:

['meeting held in room213', 'room425 is occupied']
  •  Tags:  
  • Related