Home > Enterprise >  Create a new dataframe with dataframe name, number of rows and columns from multiple dataframes usin
Create a new dataframe with dataframe name, number of rows and columns from multiple dataframes usin

Time:01-18

I have 2 data frames. I would like to get below new data frame from 2 data frames.

Desired output 

dataFrameName  no.rows   no.cols
df1Name          100      34
df2Name          212      16

I have tried like below but getting error

def dfFun(*alldfs):
    df = pd.DataFrame(columns=[['dataFrameName ','no.rows','no.cols']], index=[0])
    for i in alldfs:
        df['dataFrameName'] = i
        df['no.rows'] = i.shape[0] # getting error here
        df['no.cols'] = i.shape[1]
        

Function calling

dfFun('df1Name','df2Name')

Error

AttributeError: 'str' object has no attribute 'shape'

I have understood the error but couldn't able to get the desired output.

CodePudding user response:

The error you get comes with the way you call your function.

dfFun('df1Name','df2Name')

The use of quotation marks here means instead of inputting two dataframe variables you are instead inputting two strings. Therefore when calling

df['no.rows'] = i.shape[0]

you get the error

AttributeError: 'str' object has no attribute 'shape'

as you're trying to get the shape of a string not of a dataframe.

I understand you also want to store the variable name of the dataframe. To do this you should make the following tweak as seen [here by jfs][1].

However, I believe if you have lots of variables this could possibly add unwanted overhead as you are needing to search through all the variables. Therefore, there may be a nicer way to keep track of the dataframe.

#get variable name
def namestr(obj, namespace): 
    return [name for name in namespace if namespace[name] is obj]

def dfFun(*alldfs):
    df = pd.DataFrame(columns=[['dataFrameName ','no.rows','no.cols']], index=[0])
    for i in alldfs:
        df['dataFrameName'] = namestr(i, globals())[0]
        df['no.rows'] = i.shape[0] # getting error here
        df['no.cols'] = i.shape[1]

Call this function as follows

dfFun(df1, df2)

Where, df1 and df2 are just pandas dataframes. [1]: https://stackoverflow.com/a/592891/14517058

  •  Tags:  
  • Related