Slicing numeric value from a categorical column's text-CodePudding

I'm working with a dataframe where one of the columns is like this:

Rating
4.8 out of 5 stars
4.0 out of 5 stars
4.5 out of 5 stars

and I want to slice this data keeping only the first number, e.g.

Rating
4.8
4.0
4.5

how can I solve it?

CodePudding user response：

To extract a field from a string (or categorical) column's text, use pandas Series.str.extract with a regex:

df['Rating'].str.extract('([1-5]\.[0-9])')

     0
0  4.8
1  4.0
2  4.5

df = pd.DataFrame({'Rating': ['4.8 out of 5 stars', '4.0 out of 5 stars', '4.5 out of 5 stars']}, dtype='category')

You can tweak that regex if you need, please see the manpage. It assumes all ratings are a decimal (not integer), and have one decimal place.