I have a dataframe:
import pandas as pd
data = [('s1', 's2'),
('s1', 's3'),
('s2', 's4'),
('s3', 's5'),
('s5', 's6')]
df = pd.DataFrame(data, columns=['start', 'end'])
----- ---
|start|end|
----- ---
| s1| s2|
| s1| s3|
| s2| s4|
| s3| s5|
| s5| s6|
----- ---
I want to see if the end column has values from the start column and write their values in a new end2 column
new_df = df
df = df.join(new_df , (df.start== new_df.end))
The result is something like this:
----- --- ----
|start|end|end2|
----- --- ----
| s1| s2| s4|
| s1| s3| s5|
| s2| s4|null|
| s3| s5| s6|
| s5| s6|null|
----- --- ----
Then I want to join again to see if end2 has values in start and write their values from the end column to the new end3 column. And so do the join until the last column is filled with all the None values
That is, an iterative join along the columns comes out (there are actually a lot more rows in my dataframe, so writing each join is not very good). But I don't understand how to do it. The result should be something like this
while df[-1].notnull():
df = df.join... #join ends columns
new_df[] = #add new column
----- --- ---- ---- ----
|start|end|end2|end3|end4|
----- --- ---- ---- ----
| s1| s2| s4|None|None|
| s1| s3| s5| s6|None|
| s2| s4|None|None|None|
| s3| s5| s6|None|None|
| s5| s6|None|None|None|
----- --- ---- ---- ----
CodePudding user response:
Try mapping the original dictionary to the end column, using start as the key, and end as the values:
df.assign(end2 = df['end'].map(dict(df.to_records(index=False))))
Output:
start end end2
0 s1 s2 s4
1 s1 s3 s5
2 s2 s4 NaN
3 s3 s5 s6
4 s5 s6 NaN
To create all possible columns, we can use a while loop:
i = 2
m = dict(df.to_records(index=False))
while df.iloc[:,-1].count() != 0:
df['end{}'.format(i)] = df.iloc[:,-1].map(m)
i = 1
Output:
start end end2 end3 end4
0 s1 s2 s4 NaN NaN
1 s1 s3 s5 s6 NaN
2 s2 s4 NaN NaN NaN
3 s3 s5 s6 NaN NaN
4 s5 s6 NaN NaN NaN
