Condense rows in a DataFrame-CodePudding

I'm trying to change the structure of a dataframe, currently it looks like this (an approximation of my data):

Date    Var1        Var2      Var3  Var4    Client  Code
Jan     You win!    NaN       1     NaN     Yep     100
Jan     NaN         You lose! NaN   0       Yep     100
Feb     Go for it!  NaN       1     NaN     Bar     200
Feb     NaN         Dang      NaN   0       Bar     200
Mar     Go for it!  NaN       0     NaN     Foo     300
Mar     NaN         Darn      NaN   1       Foo     300

Unfortunately this pattern is not consistent over the entirety of the DataFrame. Assume all the values are strings. I'm trying to condense it down based on if the Date, Client, and Code are the same.

Expected Output:

Date    Var1        Var2       Var3 Var4    Client  Code
Jan     You win!    You lose!  1    0       Yep     100
Feb     Go for it!  Dang       1    0       Bar     200
Mar     Go for it!  Darn       0    1       Foo     300

I'm really not sure how I'd do this, I guess I'm trying to group by Date, Client, and Code, but I don't want to do any aggregating I'm just trying to fill in the nans and then delete the duplicate rows.

df constructor:

df = pd.DataFrame({'Date': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
                   'Var1': ['You win!', np.nan, 'Go for it!', np.nan, 'Go for it!', np.nan],
                   'Var2': [np.nan, 'You lose!', np.nan, 'Dang', np.nan, 'Darn'],
                   'Var3': [1.0, np.nan, 1.0, np.nan, 0.0, np.nan],
                   'Var4': [np.nan, 0.0, np.nan, 0.0, np.nan, 1.0],
                   'Client': ['Yep', 'Yep', 'Bar', 'Bar', 'Foo', 'Foo'],
                   'Code': [100, 100, 200, 200, 300, 300]})

CodePudding user response：

Assuming that the NaN pattern is consistent, you can do that with the following

df2 = df.iloc[::2, :]
df2["Var2"] = df["Var2"][~df['Var2'].isna()].values
df2["Var4"] = df["Var4"][~df['Var4'].isna()].values

print(df2)
  Date        Var1       Var2  Var3  Var4 Client  Code
0  Jan    You win!  You lose!     1     0    Yep   100
2  Feb  Go for it!       Dang     1     0    Bar   200
4  Mar  Go for it!       Darn     0     1    Foo   300

CodePudding user response：

You can use groupby_first. It skips NaN values by default:

out = df.groupby('Date', sort=False).first().reset_index()

Output:

  Date        Var1       Var2  Var3  Var4 Client  Code
0  Jan    You_win!  You_lose!   1.0   0.0    Yep   100
1  Feb  Go_for_it!       Dang   1.0   0.0    Bar   200
2  Mar  Go_for_it!       Darn   0.0   1.0    Foo   300