I have a file that I download from NERL API. When I try to compare it with older csv I get a difference using .equals command in padas but both files are 100% same. the only difference is one data frame is generated from CSV and another is directly from API URL.
Below is my code, why is there a difference?
import pandas as pd
NERL_url = "https://developer.nrel.gov/api/alt-fuel-stations/v1.csv?api_key=DEMO_KEY&fuel_type=ELEC&country=all"
outputPath = r"D:\<myPCPath>\nerl.csv"
urlDF = pd.read_csv(NERL_url, low_memory=False)
urlDF.to_csv(outputPath, header=True,index=None, encoding='utf-8-sig')
csv_df = pd.read_csv(outputPath, low_memory=False)
if csv_df.equals(urlDF):
print("Same")
else:
print("Different")
My output is coming as Different. How do I fix this and why is this difference comming?
CodePudding user response:
Problem is precision in read_csv, set to float_precision='round_trip' and then compared NaNs values, need replaced them to same values, like same:
NERL_url = "https://developer.nrel.gov/api/alt-fuel-stations/v1.csv?api_key=DEMO_KEY&fuel_type=ELEC&country=all"
outputPath = r"nerl.csv"
urlDF = pd.read_csv(NERL_url, low_memory=False)
urlDF.to_csv(outputPath, header=True,index=None, encoding='utf-8-sig')
csv_df = pd.read_csv(outputPath, low_memory=False, float_precision='round_trip')
if csv_df.fillna('same').equals(urlDF.fillna('same')):
print("Same")
else:
print("Different")
Same
