Home > Software design >  Changing row names in dataframe
Changing row names in dataframe

Time:02-02

I have a dataframe and one of the columns roughly looks like as shown below. Is there any way to rename rows? Rows should be renamed as psPARP8, psEXOC8, psTMEM128, psCFHR3. Where ps represents pseudogene and and the term in bracket is the code for that pseudogene. I will highly appreciate if anyone can can make a python function or any alternative to perform this task.

d = {'gene_final': ["1poly(ADP-ribose) polymerase family member 8 (PARP8) pseudogene", 
                    "exocyst complex component 8 (EXOC8) pseudogene",
                   "transmembrane protein 128 (TMEM128) pseudogene",
                   "complement factor H related 3 (CFHR3) pseudogene"]}

df = pd.DataFrame(data=d)

The desired output should look like this

gene_final
-----------
psPARP8
psEXOC8
psTMEM128
psCFHR3

CodePudding user response:

import pandas as pd
from regex import regex

# build dataframe
df = pd.DataFrame({'gene_final': ["poly(ADP-ribose) polymerase family member 8 (PARP8) pseudogene",
                                  "exocyst complex component 8 (EXOC8) pseudogene",
                                  "transmembrane protein 128 (TMEM128) pseudogene",
                                  "complement factor H related 3 (CFHR3) pseudogene"]})


def extract_name(s):
    """Helper function to extract ps name """
    s = regex.findall(r"\((\w*)\)", s)[0] # find a word between '(' and ')'
    s = f"ps{s}" # add ps to string
    return s

# apply function extract_name() to each row
df['gene_final'] = df['gene_final'].apply(extract_name)
print(df)
>   gene_final
> 0    psPARP8
> 1    psEXOC8
> 2  psTMEM128
> 3    psCFHR3

CodePudding user response:

I think you are saying about index names (rows): This is how you change the row names in DataFrames:

import pandas as pd

df = pd.DataFrame({'A': [11, 21, 31],
                   'B': [12, 22, 32],
                   'C': [13, 23, 33]},
                  index=['ONE', 'TWO', 'THREE'])

print(df)

and you can change the row names after building dataframe also like this:

df_new = df.rename(columns={'A': 'Col_1'}, index={'ONE': 'Row_1'})
print(df_new)
#        Col_1   B   C
# Row_1     11  12  13
# TWO       21  22  23
# THREE     31  32  33

print(df)
#         A   B   C
# ONE    11  12  13
# TWO    21  22  23
# THREE  31  32  33
  •  Tags:  
  • Related