I am using a code for a query, sometimes the input goes and there is no return (basically it does not find anything so the return is an empty row) so it is empty. However, when I use pd.concat, those empty rows disappear. Is there a way to keep these no return rows in the loop as well so that when I use that I can have empty rows on the final output.csv?
import numpy as np
import pandas as pd
from dl import authClient as ac, queryClient as qc
from dl.helpers.utils import convert
import openpyxl as xl
wb = xl.load_workbook('/Users/somethingfile.xlsx')
sheet = wb['Sheet 1']
df = pd.DataFrame([],columns = ['col1','col2',...,'coln'])
for row in range(3, sheet.max_row 1):
a0, b0, r = sheet.cell(row,1).value, sheet.cell(row,2).value, 0.001
query = """
SELECT a,b,c,d,e FROM smthng
WHERE q3c_radial_query(a,b,{:f},{:f},{:f}) LIMIT 1
""".format(a0,b0,r)
response = qc.query(sql=query,format='csv')
temp_df = convert(response,'pandas')
df = pd.concat([df,temp_df])
df.to_csv('output.csv')
CodePudding user response:
So, as far as I understand, the problem is that when "temp_df" is empty you want to add a blank row. You should be able to do that using the .append() method, appending an empty Series().
if len(temp_df) == 0:
temp_df=temp_df.append(pd.Series(), ignore_index=True)
#Then concat...
CodePudding user response:
For your specific question, it works if you check if temp_df is empty or not in each step and make it a DataFrame of NaNs if it is empty.
Another note on the implementation is that, concatenating in each iteration will be a very expensive operation. It is much faster if you store the temp_dfs in each iteration in a list and concatenate once after the loop is over.
lst = [] # <-- empty list to fill later
for row in range(3, sheet.max_row 1):
a0, b0, r = sheet.cell(row,1).value, sheet.cell(row,2).value, 0.001
query = """
SELECT a,b,c,d,e FROM smthng
WHERE q3c_radial_query(a,b,{:f},{:f},{:f}) LIMIT 1
""".format(a0,b0,r)
response = qc.query(sql=query,format='csv')
temp_df = convert(response,'pandas')
if temp_df.empty:
temp_df = pd.DataFrame([np.nan])
lst.append(temp_df)
df = pd.concat(lst) # <-- concat once
df.to_csv('output.csv', index=False)
