After scraping data I have the following output:
['Revenue',
'365817',
'274515',
'260174',
'265595',
'229234',
'215639',
'Cost Of Goods Sold',
'212981',
'169559',
'161782',
'163756',
'141048',
'131376',
'Gross Profit',
'152836',
'104956',
'98392',
'101839',
'88186',
'84263',
'Research And Development Expenses',
'21914',
'18752',
'16217',
'14236',
'11581',
'10045',
'SG&A Expenses',
'21973',
'19916',
'18245',
'16705',
'15261',
'14194',
'Other Operating Income Or Expenses',
'-',
'-',
'-',
'-',
'-',
'-',
'Operating Expenses',
'256868',
'208227',
'196244',
'194697',
'167890',
'155615',
'Operating Income',
'108949',
'66288',
'63930',
'70898',
'61344',
'60024',
'Total Non-Operating Income/Expense',
'258',
'803',
'1807',
'2005',
'2745',
'1348',
'Pre-Tax Income',
'109207',
'67091',
'65737',
'72903',
'64089',
'61372',
'Income Taxes',
'14527',
'9680',
'10481',
'13372',
'15738',
'15685',
'Income After Taxes',
'94680',
'57411',
'55256',
'59531',
'48351',
'45687',
'Other Income',
'-',
'-',
'-',
'-',
'-',
'-',
'Income From Continuous Operations',
'94680',
'57411',
'55256',
'59531',
'48351',
'45687',
'Income From Discontinued Operations',
'-',
'-',
'-',
'-',
'-',
'-',
'Net Income',
'94680',
'57411',
'55256',
'59531',
'48351',
'45687',
'EBITDA',
'120233',
'77344',
'76477',
'81801',
'71501',
'70529',
'EBIT',
'108949',
'66288',
'63930',
'70898',
'61344',
'60024',
'Basic Shares Outstanding',
'16701',
'17352',
'18471',
'19822',
'20869',
'21883',
'Shares Outstanding',
'16865',
'17528',
'18596',
'20000',
'21007',
'22001',
'Basic EPS',
'567',
'331',
'299',
'300',
'232',
'209',
'EPS - Earnings Per Share',
'561',
'328',
'297',
'298',
'230',
'208']
When I try to create a dataframe in Pandas I get only 1 column called "Revenue" and all the data below it, there is any way I can split those lines according to its title? I would like this output:
0 Revenue Cost Of Goods Sold ...
1 365817 212981 ...
2 274515 169559 ...
3 260174 161782 ...
4 265595 163756 ...
I can't use some function to split in exact number of elements because the number o variables of the initial output changes.
CodePudding user response:
What if you create dict with keys the strings and values the lists from this list. For example every time you find a string create a new key and then append to it:
last_key = None
my_dict = {}
for i in my_list:
if not (i.isnumeric() or i == '-'):
last_key = i
elif last_key in my_dict:
my_dict[last_key].append(i)
else:
my_dict[last_key] = [i]
print(my_dict)
Then just create the dataframe:
my_df = pd.DataFrame(my_dict)
