How to keep no return (zero rows) in a concatenation loop?-CodePudding

I am using a code for a query, sometimes the input goes and there is no return (basically it does not find anything so the return is an empty row) so it is empty. However, when I use pd.concat, those empty rows disappear. Is there a way to keep these no return rows in the loop as well so that when I use that I can have empty rows on the final output.csv?

import numpy as np
import pandas as pd

from dl import authClient as ac, queryClient as qc 
from dl.helpers.utils import convert

import openpyxl as xl

wb = xl.load_workbook('/Users/somethingfile.xlsx')
sheet = wb['Sheet 1']


df = pd.DataFrame([],columns = ['col1','col2',...,'coln'])

for row in range(3, sheet.max_row   1): 
    a0, b0, r = sheet.cell(row,1).value, sheet.cell(row,2).value, 0.001
    query = """
    SELECT a,b,c,d,e FROM smthng
      WHERE q3c_radial_query(a,b,{:f},{:f},{:f}) LIMIT 1
    """.format(a0,b0,r)
    response = qc.query(sql=query,format='csv')
    temp_df = convert(response,'pandas')
    df = pd.concat([df,temp_df])

df.to_csv('output.csv')

CodePudding user response：

So, as far as I understand, the problem is that when "temp_df" is empty you want to add a blank row. You should be able to do that using the .append() method, appending an empty Series().

if len(temp_df) == 0:
    temp_df=temp_df.append(pd.Series(), ignore_index=True)
#Then concat...

CodePudding user response：

For your specific question, it works if you check if temp_df is empty or not in each step and make it a DataFrame of NaNs if it is empty.

Another note on the implementation is that, concatenating in each iteration will be a very expensive operation. It is much faster if you store the temp_dfs in each iteration in a list and concatenate once after the loop is over.

lst = []  # <-- empty list to fill later
for row in range(3, sheet.max_row   1): 
    a0, b0, r = sheet.cell(row,1).value, sheet.cell(row,2).value, 0.001
    query = """
    SELECT a,b,c,d,e FROM smthng
      WHERE q3c_radial_query(a,b,{:f},{:f},{:f}) LIMIT 1
    """.format(a0,b0,r)
    response = qc.query(sql=query,format='csv')
    temp_df = convert(response,'pandas')

    if temp_df.empty:
        temp_df = pd.DataFrame([np.nan])
    lst.append(temp_df)
    
df = pd.concat(lst)   # <-- concat once
df.to_csv('output.csv', index=False)