We have the following column names:
our_df.columns
Index(['Rank', 'Album', 'Artist', 'Label', 'Label Description',
'Peak Position', 'Last Week Rank', 'Last 2 Week Rank', 'Weeks On Chart',
'TW Total Activity', '% CHG', 'LW Total Activity', 'TW Album Sales',
'TW Song Sales', 'TW TEA', 'TW Audio Streaming Activity',
'TW Video Streaming Activity', 'TW Total SEA (Audio Video)', 'ATD'],
dtype='object')
And we'd like to do something like this:
our_df.columns.str.replace(' ','_').replace('.', '').replace(' /-','plus_minus').lower()
...where we first make all of the replacements, and then convert everything to lowercase. However, this is failing with the error 'Index' object has no attribute 'replace'. We've updated this to
our_df.columns.str.replace(' ','_').str.replace('.', '')
# and get the warning
FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
lastly, when we try
our_df.columns.str.replace(' ','_').str.replace('.', '').str.replace(' /-','plus_minus').str.lower()
...we get the error nothing to repeat at position 0.
- Is it necessary to repeat
str.between each.replace()call? - How can we resolve future warning?
- The last error is caused by
.replace(' /-','plus_minus')because there is no/-in the columns. How can we handle this so that rather than throwing an error, it simply makes no replacements? - Am I doing this right?
CodePudding user response:
If want use str.replace need repeat it, for and . is necessary escape \ because regex special characters and add regex=True for remove future warning:
(our_df.columns.str.replace(' ','_', regex=True)
.str.replace('\.', '', regex=True)
.str.replace('\ /-','plus_minus', regex=True)
.str.lower())
Or if pass dictionary convert values to Series, because error:
AttributeError: 'Index' object has no attribute 'replace'
c = ['Rank', 'Album', 'Artist', 'Label', 'Label Description',
'Peak Position', 'Last Week Rank', 'Last 2 Week Rank', 'Weeks On Chart',
'TW Total Activity', '% CHG', 'LW Total Activity', 'TW Album Sales',
'TW Song Sales', 'TW TEA', 'TW Audio Streaming Activity',
'TW Video Streaming Activity', 'TW Total SEA (Audio /- Video)', 'ATD']
our_df = pd.DataFrame(columns=c)
our_df.columns = our_df.columns.to_series().replace({'\s ':'_','\.': '','\ /-':'plus_minus'}, regex=True).str.lower()
print (our_df)
Empty DataFrame
Columns: [rank, album, artist, label, label_description, peak_position, last_week_rank, last_2_week_rank, weeks_on_chart, tw_total_activity, %_chg, lw_total_activity, tw_album_sales, tw_song_sales, tw_tea, tw_audio_streaming_activity, tw_video_streaming_activity, tw_total_sea_(audio_plus_minus_video), atd]
Index: []
