Home > Mobile >  Split dataframe column into 2 by the last special character
Split dataframe column into 2 by the last special character

Time:02-05

I have the following dataframe, and would like to split the name column by the last underscore "_" and assign the last 4 values to a "Date" column. But get an indexing error. How do I accomplish this?

name           val
NETUSE_2014     1
NETUSE_2015     1
NETUSE_2016     1
NETUSE_2017     1
NET_ALL_2013    1
NET_ALL_2014    1
NET_ALL_2015    1
NET_ALL_2016    1

df['Year'] = df['name'].str[-4:]

I get this error:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  wet2['date'] = wet2['name'].str[-4:]

I would like the following dataframe:

name    val date
NETUSE  1   2014
NETUSE  1   2015
NETUSE  1   2016
NETUSE  1   2017
NET_ALL 1   2013
NET_ALL 1   2014
NET_ALL 1   2015
NET_ALL 1   2016

CodePudding user response:

Your code:

df['Year'] = df['name'].str[-4:]

works (since we don't know how you get df). The error is suggesting that you're trying to modify a copy of a DataFrame. So my guess is df is sliced from another bigger DataFrame without being copied.

You could also try with str.rsplit with n=1. That way, you only split once from the right:

df[['name','date']] = df['name'].str.rsplit('_', 1, expand=True)

Output:

      name  val  date
0   NETUSE    1  2014
1   NETUSE    1  2015
2   NETUSE    1  2016
3   NETUSE    1  2017
4  NET_ALL    1  2013
5  NET_ALL    1  2014
6  NET_ALL    1  2015
7  NET_ALL    1  2016

CodePudding user response:

simply do this !!works!!

df['date'] = df['name'].str[-4:]
df['name'] = df['name'].str[:-5]

output:

      name  val  year
0   NETUSE    1  2014
1   NETUSE    1  2015
2   NETUSE    1  2016
3   NETUSE    1  2017
4  NET_ALL    1  2013
5  NET_ALL    1  2014
6  NET_ALL    1  2015
7  NET_ALL    1  2016

A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

this error occurs when we try to assign value using filter or to a sliced dataframe. using the code mentioned above error doesn't occur

  •  Tags:  
  • Related