I have a dataframe column that lists information on the year a particular property was built. The value can be: a year, unknown or newly built. I want to replace all the "newly built" values with current year, keep the year values as is and replace all the "unknown" values with the average of the column. Here is my code:
y = df['year'][pd.to_numeric(df['year'], errors='coerce').notnull()].astype(float).mean()
df['year'] = df.year.apply(lambda x: 2022 if fnmatch(x,'*ewly') else x)
df['year'] = df.year.apply(lambda x: y if fnmatch(x,'*nknown*') else x)
I use fnmatch to search for a pattern, because the way it is spelt varies. If I run the lambda function once I get a proper output, but running the whole code gives the following error :
TypeError: expected str, bytes or os.PathLike object, not int
Not too sure what's the deal here. Any ideas?
CodePudding user response:
You should be able to do that with the following.
df["year"] = df["year"].replace("newly built", "2022").str.extract('(\d )').fillna(-1).astype(int)
df["year"] = df["year"].replace(-1, df[df["year"]>0]["year"].mean())
