I have 2 data frames. I would like to get below new data frame from 2 data frames.
Desired output
dataFrameName no.rows no.cols
df1Name 100 34
df2Name 212 16
I have tried like below but getting error
def dfFun(*alldfs):
df = pd.DataFrame(columns=[['dataFrameName ','no.rows','no.cols']], index=[0])
for i in alldfs:
df['dataFrameName'] = i
df['no.rows'] = i.shape[0] # getting error here
df['no.cols'] = i.shape[1]
Function calling
dfFun('df1Name','df2Name')
Error
AttributeError: 'str' object has no attribute 'shape'
I have understood the error but couldn't able to get the desired output.
CodePudding user response:
The error you get comes with the way you call your function.
dfFun('df1Name','df2Name')
The use of quotation marks here means instead of inputting two dataframe variables you are instead inputting two strings. Therefore when calling
df['no.rows'] = i.shape[0]
you get the error
AttributeError: 'str' object has no attribute 'shape'
as you're trying to get the shape of a string not of a dataframe.
I understand you also want to store the variable name of the dataframe. To do this you should make the following tweak as seen [here by jfs][1].
However, I believe if you have lots of variables this could possibly add unwanted overhead as you are needing to search through all the variables. Therefore, there may be a nicer way to keep track of the dataframe.
#get variable name
def namestr(obj, namespace):
return [name for name in namespace if namespace[name] is obj]
def dfFun(*alldfs):
df = pd.DataFrame(columns=[['dataFrameName ','no.rows','no.cols']], index=[0])
for i in alldfs:
df['dataFrameName'] = namestr(i, globals())[0]
df['no.rows'] = i.shape[0] # getting error here
df['no.cols'] = i.shape[1]
Call this function as follows
dfFun(df1, df2)
Where, df1 and df2 are just pandas dataframes.
[1]: https://stackoverflow.com/a/592891/14517058
