Add row in dataframe with same value as in specific column-CodePudding

I have this dataframe:

    0       1       2         3
0   Frank   48.2    test_1    file_1
1   John    46.7    test_1    file_1
2   Alice   39.3    test_2    file_2
3   Kim     35.6    test_2    file_2
4   Sasha   25.5    test_3    file_3
.... 
2306 rows × 4 columns

I want that for every different value on the column 2 (there are 140 different values), it will be added a row in my dataframe before the first row with that value, keeping the file_number value in the column 3 (I will need that column for saving the dataframe splitted in different files depending on the value in it), like this:

    0        1       2       3
0   test_1                   file_1
1   Frank    48.2    test_1  file_1
2   John     46.7    test_1  file_1
3   test_2                   file_2
4   Alice    39.3    test_2  file_2
5   Kim      35.6    test_2  file_2
6   test_3                   file_3
7   Sasha    25.5    test_3  file_3
....

Which is the simplest way to achieve it? Thank you for your time!

CodePudding user response：

You can check with drop_duplicates, then concat them back

s = df.drop_duplicates(['2','3']).drop(['0','1'],axis=1).rename({'2':'0'},axis=1)
out = pd.concat([s,df]).sort_index().reindex(columns=df.columns)
out
Out[15]: 
        0     1       2       3
0  test_1   NaN     NaN  file_1
0   Frank  48.2  test_1  file_1
1    John  46.7  test_1  file_1
2  test_2   NaN     NaN  file_2
2   Alice  39.3  test_2  file_2
3     Kim  35.6  test_2  file_2
4  test_3   NaN     NaN  file_3
4   Sasha  25.5  test_3  file_3

CodePudding user response：

You can filter the rows with the correct value of column 2, add to that DataFrame the row you want, and concatenate all the DataFrames obtained into one. An example is the following code:

import pandas as pd

df = <READ_YOUR_DF>
all_df = []
for i in df["2"].unique():
        new_df = pd.DataFrame(data= {"0": [i], "1":[""],"2":[""], "3":[""]})
        filter_df = df[df["2"] == i]
        to_add = pd.concat([new_df, filter_df], ignore_index=True)
        all_df.append(to_add)

result_df=pd.concat(all_df, ignore_index=True)

If you want to avoid listing all the column names when creating new_df you can use a dictionary comprehension that uses as key the iteration over df.columns