I imported and concatenated a couple of csv files. All of them contain the variable "prac_type" but the observations are listed in different ways. Some are strings (yes, no, unsure) while the others are numeric (1,2,3). Here is a look at the variable:
print(df.prac_type.unique())
[nan 1.0 2.0 3.0 '1' '2' 'Unsure']
But I just want 1.0 to merge into 1 (since they are representing the same thing), 2.0 to become 2, and 3.0 and unsure to become 3. I want my variable to be this:
print(df.prac_type.unique())
[ '1' '2' '3']
I tried doing this:
prac_dic = {'1.0': 1,'2.0': 2 , '3.0':3, 'Unsure':3}
df.prac_type = [prac_dic[item] for item in df.prac_type]
print(df.prac_type.unique())
But I get an error (KeyError: nan) because my variable prac_type has NaNs. I don't want to drop the NaNs though. So how can I get my code to ignore the missing values and reassign the values?
CodePudding user response:
Just add one special check on the nan value
df.prac_type = [prac_dic[item] if pandas.notnull(item) else np.nan for item in df.prac_type ]
https://pandas.pydata.org/docs/reference/api/pandas.isnull.html
CodePudding user response:
Try df.prac_type = [prac_dic.get(item) for item in df.prac_type]
