I'm trying to change the structure of a dataframe, currently it looks like this (an approximation of my data):
Date Var1 Var2 Var3 Var4 Client Code
Jan You win! NaN 1 NaN Yep 100
Jan NaN You lose! NaN 0 Yep 100
Feb Go for it! NaN 1 NaN Bar 200
Feb NaN Dang NaN 0 Bar 200
Mar Go for it! NaN 0 NaN Foo 300
Mar NaN Darn NaN 1 Foo 300
Unfortunately this pattern is not consistent over the entirety of the DataFrame. Assume all the values are strings. I'm trying to condense it down based on if the Date, Client, and Code are the same.
Expected Output:
Date Var1 Var2 Var3 Var4 Client Code
Jan You win! You lose! 1 0 Yep 100
Feb Go for it! Dang 1 0 Bar 200
Mar Go for it! Darn 0 1 Foo 300
I'm really not sure how I'd do this, I guess I'm trying to group by Date, Client, and Code, but I don't want to do any aggregating I'm just trying to fill in the nans and then delete the duplicate rows.
df constructor:
df = pd.DataFrame({'Date': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
'Var1': ['You win!', np.nan, 'Go for it!', np.nan, 'Go for it!', np.nan],
'Var2': [np.nan, 'You lose!', np.nan, 'Dang', np.nan, 'Darn'],
'Var3': [1.0, np.nan, 1.0, np.nan, 0.0, np.nan],
'Var4': [np.nan, 0.0, np.nan, 0.0, np.nan, 1.0],
'Client': ['Yep', 'Yep', 'Bar', 'Bar', 'Foo', 'Foo'],
'Code': [100, 100, 200, 200, 300, 300]})
CodePudding user response:
Assuming that the NaN pattern is consistent, you can do that with the following
df2 = df.iloc[::2, :]
df2["Var2"] = df["Var2"][~df['Var2'].isna()].values
df2["Var4"] = df["Var4"][~df['Var4'].isna()].values
print(df2)
Date Var1 Var2 Var3 Var4 Client Code
0 Jan You win! You lose! 1 0 Yep 100
2 Feb Go for it! Dang 1 0 Bar 200
4 Mar Go for it! Darn 0 1 Foo 300
CodePudding user response:
You can use groupby_first. It skips NaN values by default:
out = df.groupby('Date', sort=False).first().reset_index()
Output:
Date Var1 Var2 Var3 Var4 Client Code
0 Jan You_win! You_lose! 1.0 0.0 Yep 100
1 Feb Go_for_it! Dang 1.0 0.0 Bar 200
2 Mar Go_for_it! Darn 0.0 1.0 Foo 300
