Home > Back-end >  Pandas Conditional value fill, dependant on precious row's value
Pandas Conditional value fill, dependant on precious row's value

Time:01-18

Current Input:

import pandas as pd
import numpy as np
 
# initialize list of lists
data = [
        ['2017-08-17 04:00:00',      1        ],
        ['2017-08-17 04:01:00',      2        ],     
        ['2017-08-17 04:02:00',      None     ],    
        ['2017-08-17 04:03:00',      None     ],   
        ['2017-08-17 04:04:00',      None     ],
        ['2017-08-17 04:05:00',      3        ],     
        ['2017-08-17 04:06:00',      4        ],    
        ['2017-08-17 04:07:00',      10       ],
        ['2017-08-17 04:08:00',      11       ],     
        ['2017-08-17 04:09:00',      None     ],    
        ['2017-08-17 04:10:00',      11       ],   
        ['2017-08-17 04:10:00',      11       ],
        ['2017-08-17 04:11:00',      None     ],     
        ['2017-08-17 04:12:00',      12       ],   
        ['2017-08-17 04:13:00',      11       ]]
 
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['date', 'price']) 
Desired Output:
    
data = [ date                        price    entry
        ['2017-08-17 04:00:00',      1              ],
        ['2017-08-17 04:01:00',      2              ],     
        ['2017-08-17 04:02:00',      None           ],    
        ['2017-08-17 04:03:00',      None           ],   
        ['2017-08-17 04:04:00',      None           ],
        ['2017-08-17 04:05:00',      3        3     ],     
        ['2017-08-17 04:06:00',      4              ],    
        ['2017-08-17 04:07:00',      10             ],
        ['2017-08-17 04:08:00',      11             ],     
        ['2017-08-17 04:09:00',      None           ],    
        ['2017-08-17 04:10:00',      11       11    ],   
        ['2017-08-17 04:10:00',      11             ],
        ['2017-08-17 04:11:00',      None           ],     
        ['2017-08-17 04:12:00',      12       12    ],   
        ['2017-08-17 04:13:00',      11             ]]

I am trying to make it so that a 3rd column named "entry" takes the value of column "price" if the price on the previous row is None and the current row's price is not none. I have tried the code below but it does not work. All it does is make the whole "entry" column None.

condition1 = (df['price'].shift(1) is None) & (df['price'] is not None)
df['entry'] = np.where(condition1, df['price'] , None)

CodePudding user response:

Your column seems to contain strings, maybe this should work:

df['entry'] = df.loc[df['price'].ne('') & df['price'].shift(fill_value='').eq(''), 
                     'price'].reindex(df.index).fillna('')
print(df)

# Output
                   date price entry
0   2017-08-17 04:00:00     1      
1   2017-08-17 04:01:00     2      
2   2017-08-17 04:02:00            
3   2017-08-17 04:03:00            
4   2017-08-17 04:04:00            
5   2017-08-17 04:05:00     3     3
6   2017-08-17 04:06:00     4      
7   2017-08-17 04:07:00    10      
8   2017-08-17 04:08:00    11      
9   2017-08-17 04:09:00            
10  2017-08-17 04:10:00    11    11
11  2017-08-17 04:10:00    11      
12  2017-08-17 04:11:00            
13  2017-08-17 04:12:00    12    12
14  2017-08-17 04:13:00    11      

CodePudding user response:

First convert values to numeric and non numeric to NaNs, because pandas processing None like Nonetype and NaN like np.nan same way.

I suggest dont use empty strings, because then get mixed strings and numbers and next processing with numbers should failed.

df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['testNA'] = df['price'].isna()
print (df)
                   date  price  testNA
0   2017-08-17 04:00:00    1.0   False
1   2017-08-17 04:01:00    2.0   False
2   2017-08-17 04:02:00    NaN    True
3   2017-08-17 04:03:00    NaN    True
4   2017-08-17 04:04:00    NaN    True
5   2017-08-17 04:05:00    3.0   False
6   2017-08-17 04:06:00    4.0   False
7   2017-08-17 04:07:00   10.0   False
8   2017-08-17 04:08:00   11.0   False
9   2017-08-17 04:09:00    NaN    True
10  2017-08-17 04:10:00   11.0   False
11  2017-08-17 04:10:00   11.0   False
12  2017-08-17 04:11:00    NaN    True
13  2017-08-17 04:12:00   12.0   False
14  2017-08-17 04:13:00   11.0   False

Add fill_value=0 to Series.shift and compare by Series.isna and Series.notna:

condition1 = df['price'].shift(1, fill_value=0).isna() & df['price'].notna()
df['entry'] = np.where(condition1, df['price'] , None)

print (df)
                   date  price  testNA entry
0   2017-08-17 04:00:00    1.0   False  None
1   2017-08-17 04:01:00    2.0   False  None
2   2017-08-17 04:02:00    NaN    True  None
3   2017-08-17 04:03:00    NaN    True  None
4   2017-08-17 04:04:00    NaN    True  None
5   2017-08-17 04:05:00    3.0   False   3.0
6   2017-08-17 04:06:00    4.0   False  None
7   2017-08-17 04:07:00   10.0   False  None
8   2017-08-17 04:08:00   11.0   False  None
9   2017-08-17 04:09:00    NaN    True  None
10  2017-08-17 04:10:00   11.0   False  11.0
11  2017-08-17 04:10:00   11.0   False  None
12  2017-08-17 04:11:00    NaN    True  None
13  2017-08-17 04:12:00   12.0   False  12.0
14  2017-08-17 04:13:00   11.0   False  None
    
  •  Tags:  
  • Related