I have this dataframe:
0 1 2 3
0 Frank 48.2 test_1 file_1
1 John 46.7 test_1 file_1
2 Alice 39.3 test_2 file_2
3 Kim 35.6 test_2 file_2
4 Sasha 25.5 test_3 file_3
....
2306 rows × 4 columns
I want that for every different value on the column 2 (there are 140 different values), it will be added a row in my dataframe before the first row with that value, keeping the file_number value in the column 3 (I will need that column for saving the dataframe splitted in different files depending on the value in it), like this:
0 1 2 3
0 test_1 file_1
1 Frank 48.2 test_1 file_1
2 John 46.7 test_1 file_1
3 test_2 file_2
4 Alice 39.3 test_2 file_2
5 Kim 35.6 test_2 file_2
6 test_3 file_3
7 Sasha 25.5 test_3 file_3
....
Which is the simplest way to achieve it? Thank you for your time!
CodePudding user response:
You can check with drop_duplicates, then concat them back
s = df.drop_duplicates(['2','3']).drop(['0','1'],axis=1).rename({'2':'0'},axis=1)
out = pd.concat([s,df]).sort_index().reindex(columns=df.columns)
out
Out[15]:
0 1 2 3
0 test_1 NaN NaN file_1
0 Frank 48.2 test_1 file_1
1 John 46.7 test_1 file_1
2 test_2 NaN NaN file_2
2 Alice 39.3 test_2 file_2
3 Kim 35.6 test_2 file_2
4 test_3 NaN NaN file_3
4 Sasha 25.5 test_3 file_3
CodePudding user response:
You can filter the rows with the correct value of column 2, add to that DataFrame the row you want, and concatenate all the DataFrames obtained into one. An example is the following code:
import pandas as pd
df = <READ_YOUR_DF>
all_df = []
for i in df["2"].unique():
new_df = pd.DataFrame(data= {"0": [i], "1":[""],"2":[""], "3":[""]})
filter_df = df[df["2"] == i]
to_add = pd.concat([new_df, filter_df], ignore_index=True)
all_df.append(to_add)
result_df=pd.concat(all_df, ignore_index=True)
If you want to avoid listing all the column names when creating new_df you can use a dictionary comprehension that uses as key the iteration over df.columns
