Current Input:
import pandas as pd
import numpy as np
# initialize list of lists
data = [
['2017-08-17 04:00:00', 1 ],
['2017-08-17 04:01:00', 2 ],
['2017-08-17 04:02:00', None ],
['2017-08-17 04:03:00', None ],
['2017-08-17 04:04:00', None ],
['2017-08-17 04:05:00', 3 ],
['2017-08-17 04:06:00', 4 ],
['2017-08-17 04:07:00', 10 ],
['2017-08-17 04:08:00', 11 ],
['2017-08-17 04:09:00', None ],
['2017-08-17 04:10:00', 11 ],
['2017-08-17 04:10:00', 11 ],
['2017-08-17 04:11:00', None ],
['2017-08-17 04:12:00', 12 ],
['2017-08-17 04:13:00', 11 ]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['date', 'price'])
Desired Output:
data = [ date price entry
['2017-08-17 04:00:00', 1 ],
['2017-08-17 04:01:00', 2 ],
['2017-08-17 04:02:00', None ],
['2017-08-17 04:03:00', None ],
['2017-08-17 04:04:00', None ],
['2017-08-17 04:05:00', 3 3 ],
['2017-08-17 04:06:00', 4 ],
['2017-08-17 04:07:00', 10 ],
['2017-08-17 04:08:00', 11 ],
['2017-08-17 04:09:00', None ],
['2017-08-17 04:10:00', 11 11 ],
['2017-08-17 04:10:00', 11 ],
['2017-08-17 04:11:00', None ],
['2017-08-17 04:12:00', 12 12 ],
['2017-08-17 04:13:00', 11 ]]
I am trying to make it so that a 3rd column named "entry" takes the value of column "price" if the price on the previous row is None and the current row's price is not none. I have tried the code below but it does not work. All it does is make the whole "entry" column None.
condition1 = (df['price'].shift(1) is None) & (df['price'] is not None)
df['entry'] = np.where(condition1, df['price'] , None)
CodePudding user response:
Your column seems to contain strings, maybe this should work:
df['entry'] = df.loc[df['price'].ne('') & df['price'].shift(fill_value='').eq(''),
'price'].reindex(df.index).fillna('')
print(df)
# Output
date price entry
0 2017-08-17 04:00:00 1
1 2017-08-17 04:01:00 2
2 2017-08-17 04:02:00
3 2017-08-17 04:03:00
4 2017-08-17 04:04:00
5 2017-08-17 04:05:00 3 3
6 2017-08-17 04:06:00 4
7 2017-08-17 04:07:00 10
8 2017-08-17 04:08:00 11
9 2017-08-17 04:09:00
10 2017-08-17 04:10:00 11 11
11 2017-08-17 04:10:00 11
12 2017-08-17 04:11:00
13 2017-08-17 04:12:00 12 12
14 2017-08-17 04:13:00 11
CodePudding user response:
First convert values to numeric and non numeric to NaNs, because pandas processing None like Nonetype and NaN like np.nan same way.
I suggest dont use empty strings, because then get mixed strings and numbers and next processing with numbers should failed.
df['price'] = pd.to_numeric(df['price'], errors='coerce')
df['testNA'] = df['price'].isna()
print (df)
date price testNA
0 2017-08-17 04:00:00 1.0 False
1 2017-08-17 04:01:00 2.0 False
2 2017-08-17 04:02:00 NaN True
3 2017-08-17 04:03:00 NaN True
4 2017-08-17 04:04:00 NaN True
5 2017-08-17 04:05:00 3.0 False
6 2017-08-17 04:06:00 4.0 False
7 2017-08-17 04:07:00 10.0 False
8 2017-08-17 04:08:00 11.0 False
9 2017-08-17 04:09:00 NaN True
10 2017-08-17 04:10:00 11.0 False
11 2017-08-17 04:10:00 11.0 False
12 2017-08-17 04:11:00 NaN True
13 2017-08-17 04:12:00 12.0 False
14 2017-08-17 04:13:00 11.0 False
Add fill_value=0 to Series.shift and compare by Series.isna and
Series.notna:
condition1 = df['price'].shift(1, fill_value=0).isna() & df['price'].notna()
df['entry'] = np.where(condition1, df['price'] , None)
print (df)
date price testNA entry
0 2017-08-17 04:00:00 1.0 False None
1 2017-08-17 04:01:00 2.0 False None
2 2017-08-17 04:02:00 NaN True None
3 2017-08-17 04:03:00 NaN True None
4 2017-08-17 04:04:00 NaN True None
5 2017-08-17 04:05:00 3.0 False 3.0
6 2017-08-17 04:06:00 4.0 False None
7 2017-08-17 04:07:00 10.0 False None
8 2017-08-17 04:08:00 11.0 False None
9 2017-08-17 04:09:00 NaN True None
10 2017-08-17 04:10:00 11.0 False 11.0
11 2017-08-17 04:10:00 11.0 False None
12 2017-08-17 04:11:00 NaN True None
13 2017-08-17 04:12:00 12.0 False 12.0
14 2017-08-17 04:13:00 11.0 False None
