I have no idea why this isn't working... Why am I not able to get rid of these?
I have tried the following:
dfa = dfa[dfa['Date Sold_y'].str.len() < 4] #empty
dfa = dfa[dfa['Date Sold_y'] != ''] #no change
dfa = dfa[dfa['Date Sold_y'] != np.nan] #no change
Dtype is string, and sample values below:
['May-30-2018', nan, nan, 'June-11-2014', 'December-3-2021', nan, 'February-2-2022', nan, nan, 'December-30-2011', nan, nan, nan, nan, nan, nan, nan, nan, 'November-30-2021', nan, 'April-1-2020', nan, 'May-10-2007', nan, nan, nan, nan, nan, nan, 'January-28-2022', nan, nan, nan, 'January-18-2022', nan, nan, nan, 'January-12-2022', nan, 'November-15-2021'
CodePudding user response:
- Maybe
nanvalues are string with extra whitespaces:
>>> dfa[dfa['Date Sold_y'].str.strip() != 'nan']
Date Sold_y
0 May-30-2018
3 June-11-2014
4 December-3-2021
6 February-2-2022
9 December-30-2011
18 November-30-2021
20 April-1-2020
22 May-10-2007
29 January-28-2022
33 January-18-2022
37 January-12-2022
39 November-15-2021
- You can also reverse the logic and keep rows ended by a year:
>>> dfa[dfa['Date Sold_y'].str.contains('\d{4}$')]
- Or if it's really
nanvalues, as suggested by @HenryEcker:
>>> dfa[dfa['Date Sold_y'].notna()]
# OR
>>> dfa[~dfa['Date Sold_y'].isna()]
CodePudding user response:
By the way if the values are actually nan (and not strings) check out the dropna() method of pandas.DataFrame. It allows to drop rows of the dataframe if one or more nan is found (you can chose) or you can specify a subset of columns to check against nan values
