Home > Enterprise >  CSV dataframe doesn't match with dataframe generated from URL
CSV dataframe doesn't match with dataframe generated from URL

Time:02-03

I have a file that I download from NERL API. When I try to compare it with older csv I get a difference using .equals command in padas but both files are 100% same. the only difference is one data frame is generated from CSV and another is directly from API URL.

Below is my code, why is there a difference?

import pandas as pd
NERL_url = "https://developer.nrel.gov/api/alt-fuel-stations/v1.csv?api_key=DEMO_KEY&fuel_type=ELEC&country=all" 
outputPath = r"D:\<myPCPath>\nerl.csv"
urlDF = pd.read_csv(NERL_url, low_memory=False)
urlDF.to_csv(outputPath, header=True,index=None, encoding='utf-8-sig')

csv_df = pd.read_csv(outputPath, low_memory=False)  


if csv_df.equals(urlDF):
    print("Same")
else:
    print("Different")

My output is coming as Different. How do I fix this and why is this difference comming?

CodePudding user response:

Problem is precision in read_csv, set to float_precision='round_trip' and then compared NaNs values, need replaced them to same values, like same:

NERL_url = "https://developer.nrel.gov/api/alt-fuel-stations/v1.csv?api_key=DEMO_KEY&fuel_type=ELEC&country=all" 

outputPath = r"nerl.csv"
urlDF = pd.read_csv(NERL_url, low_memory=False)
urlDF.to_csv(outputPath, header=True,index=None, encoding='utf-8-sig')

csv_df = pd.read_csv(outputPath, low_memory=False, float_precision='round_trip')  


if csv_df.fillna('same').equals(urlDF.fillna('same')):
    print("Same")
else:
    print("Different")
    
Same
  •  Tags:  
  • Related