Home > Net >  Creating df from dict, but renaming the columns not working
Creating df from dict, but renaming the columns not working

Time:01-24

I wish to populate a Pandas DataFrame using the values from a dictionary and also change the column names of the DataFrame because by default the keys of the dict object become the column names. I have the following code snippet :

mydict = {'Name':['Anindita'],'Year':[1993]}
df = pd.DataFrame(mydict,columns=['a','b'])
print(df)

Unfortunately, every time I run this, it creates a blank DataFrame, and the value inside my dictionary object isn't restored. I googled a lot and also came up with a solution like this :

df = pd.DataFrame.from_records(list(mydict.items()),columns=['a','b'])
print(df)             #first creating a list of tuples from the items of the dictionary and then using from_records method to accomplish the same

However, I have an instinct that this conversion from dict to list object isn't necessary and it could have been done with the dict object only.

All other ways to do it have failed.

I even thoroughly checked the Pandas DataFrame using constructor documentation but couldn't map my problem with what was written.

Any help would be highly appreciated. Thanks!


Please note, I don't wish to only create `DataFrame` from the `dict` object, I have to change the column names of the `DataFrame` as well. 

CodePudding user response:

You are closed. Use index instead of columns then transpose your dataframe:

mydict = {'Name': ['Anindita', 'Marcel'], 'Year': [1993, 1970]}

df = pd.DataFrame(mydict.values(), index=['a', 'b']).T
print(df)

# Output
          0     1
a  Anindita  1993
b    Marcel  1970

If you pass a list of list the orientation of the dataframe is index so rename your index instead of columns then transpose.

Step by step:

>>> mydict.values()
dict_values([['Anindita', 'Marcel'], [1993, 1970]])

>>> pd.DataFrame(mydict.values())
          0       1
0  Anindita  Marcel
1      1993    1970

>>> pd.DataFrame(mydict.values(), index=['a', 'b'])
          0       1
a  Anindita  Marcel
b      1993    1970

>>> pd.DataFrame(mydict.values(), index=['a', 'b']).T
          a     b
0  Anindita  1993
1    Marcel  1970

If you want to change the orientation, use zip:

df = pd.DataFrame(zip(*mydict.values()), columns=['a', 'b'])
print(df)

# Output
          a     b
0  Anindita  1993
1    Marcel  1970

CodePudding user response:

After passing in the dictionary, you can rename the columns by passing in another dictionary with the key-value pairs formatted in the following manner, Original Name : New Name.

df = pd.DataFrame(data=mydict).rename(columns={"Name":'a',"Year":'b'})

Output:

          a     b
0  Anindita  1993

Let me know if this is what you were intending to happen.

Edit: If you wanted to achieve the same thing using a numpy array it is possible like:

df = pd.DataFrame(np.array([['Anindita', 1993]]), columns =['a','b'])
print(df)

The difference is the numpy array doesn't have column names specified, so whatever process is creating the DataFrame in pandas understands to put the array data into columns based on their positions.

From the comments, it is now clear to me that column selection occurs when the data contains column labels (i.e. in a dictionary), which is what leads to the empty rows, rather than any overwriting. When there are no column labels in the data, the provided column labels are taken as the labels for the data.

  •  Tags:  
  • Related