I have this dataset:
| Column A |
|---|
| pt abcdefg |
| cv fghikl |
| abcdg pt |
| opqrs cv |
| bp ststst |
| qwert bp |
I want the word 'pt', 'cv', and 'bp' to the last of the sentence, so this is the output that I want:
| Column A |
|---|
| abcdefg pt |
| fghikl cv |
| abcdg pt |
| opqrs cv |
| ststst bp |
| qwert bp |
I haven't tried any code but I found this code but I'm stuck in modifying it since I want to apply it to the whole DataFrame.
def order_word(s, word, delta):
words = s.split()
oldpos = words.index(word)
words.insert(oldpos delta, words.pop(oldpos))
return ' '.join(words)
Can anyone help me to build the code? Thanks in advance.
CodePudding user response:
Here is a proposition using pandas.Series.str.split with sorted :
df["Column A"] = (
df["Column A"]
.str.split()
.apply(lambda x: " ".join(sorted(x, key=len, reverse=True)))
)
# Output :
print(df)
Column A
0 abcdefg pt
1 fghikl cv
2 abcdg pt
3 opqrs cv
4 ststst bp
5 qwert bp
CodePudding user response:
You can use a regex with str.replace:
df['Column A'] = df['Column A'].str.replace(r'\s*\b(cv|pt|bp)\b\s*(.*$)',
r'\2 \1', regex=True)
Output (as new column for clarity):
Column A Column B
0 pt abcdefg abcdefg pt
1 cv fghikl fghikl cv
2 abcdg pt abcdg pt
3 opqrs cv opqrs cv
4 bp ststst ststst bp
5 qwert bp qwert bp
