I have an Excel workbook with 35 worksheets, each worksheet has 100 rows and 30 columns. I am only wanting 9 rows (with all columns for those 9 rows) from each worksheet. I use to be able to get the rows using skiprows because there was a pattern. I used the following to read my required rows from each worksheet:
xl = pd.ExcelFile('group.xlsx')
locations = (list(xl.sheet_names))
df = pd.concat(pd.read_excel('group.xlsx', sheet_name= locations, skiprows = 62, nrows= 9, usecols= None), axis=1, ignore_index= True)
This does not work for me anymore as the information I need is no longer in the same rows for each worksheet (some worksheet it may be in row 62, or 73, or 147, or so on). Is there a way to skiprows to a specific string ("ABC 123") instead of a specific row? There is only one occurrence of the "ABC 123" string in each worksheet.
If there is no way to skiprows to a string, is there a way I could get the 9 rows after the "ABC 123" string for each worksheet by altering this:
dfa = pd.read_excel('group.xlsx', sheet_name=None)
data = {}
for sheet, df in dfa.items():
data[sheet] = df.loc[df['col_2'] == "ABC 123"]
df = pd.concat(data)
I feel like this is almost there, but I am unsure what it is that I need to add.
CodePudding user response:
I'm not sure what you're asking exactly cause you give us no further information about the data of the worksheet but this might help:
data = data[data['column'] == 'ABC']
CodePudding user response:
I achieved the results using another method to complete this. Concat all sheets into one, slice on defined indicies, and then concat.
df = pd.concat(pd.read_excel('group.xlsx', sheet_name=None), ignore_index=True)
indicies = df[df['col_2'] == 'ABC 123'].index #slices = [slice(i, i 9, -1) for i in indicies]
results= []
for i in indicies:
temp= (df.iloc[i 1:i 9])
results.append(temp)
df= pd.concat(results)
col_list= df.columns.values.tolist()
df.dropna(how='all', subset=col_list, inplace=True)
