I wish to populate a Pandas DataFrame using the values from a dictionary and also change the column names of the DataFrame because by default the keys of the dict object become the column names. I have the following code snippet :
mydict = {'Name':['Anindita'],'Year':[1993]}
df = pd.DataFrame(mydict,columns=['a','b'])
print(df)
Unfortunately, every time I run this, it creates a blank DataFrame, and the value inside my dictionary object isn't restored. I googled a lot and also came up with a solution like this :
df = pd.DataFrame.from_records(list(mydict.items()),columns=['a','b'])
print(df) #first creating a list of tuples from the items of the dictionary and then using from_records method to accomplish the same
However, I have an instinct that this conversion from dict to list object isn't necessary and it could have been done with the dict object only.
All other ways to do it have failed.
I even thoroughly checked the Pandas DataFrame using constructor documentation but couldn't map my problem with what was written.
Any help would be highly appreciated. Thanks!
Please note, I don't wish to only create `DataFrame` from the `dict` object, I have to change the column names of the `DataFrame` as well.
CodePudding user response:
You are closed. Use index instead of columns then transpose your dataframe:
mydict = {'Name': ['Anindita', 'Marcel'], 'Year': [1993, 1970]}
df = pd.DataFrame(mydict.values(), index=['a', 'b']).T
print(df)
# Output
0 1
a Anindita 1993
b Marcel 1970
If you pass a list of list the orientation of the dataframe is index so rename your index instead of columns then transpose.
Step by step:
>>> mydict.values()
dict_values([['Anindita', 'Marcel'], [1993, 1970]])
>>> pd.DataFrame(mydict.values())
0 1
0 Anindita Marcel
1 1993 1970
>>> pd.DataFrame(mydict.values(), index=['a', 'b'])
0 1
a Anindita Marcel
b 1993 1970
>>> pd.DataFrame(mydict.values(), index=['a', 'b']).T
a b
0 Anindita 1993
1 Marcel 1970
If you want to change the orientation, use zip:
df = pd.DataFrame(zip(*mydict.values()), columns=['a', 'b'])
print(df)
# Output
a b
0 Anindita 1993
1 Marcel 1970
CodePudding user response:
After passing in the dictionary, you can rename the columns by passing in another dictionary with the key-value pairs formatted in the following manner, Original Name : New Name.
df = pd.DataFrame(data=mydict).rename(columns={"Name":'a',"Year":'b'})
Output:
a b
0 Anindita 1993
Let me know if this is what you were intending to happen.
Edit: If you wanted to achieve the same thing using a numpy array it is possible like:
df = pd.DataFrame(np.array([['Anindita', 1993]]), columns =['a','b'])
print(df)
The difference is the numpy array doesn't have column names specified, so whatever process is creating the DataFrame in pandas understands to put the array data into columns based on their positions.
From the comments, it is now clear to me that column selection occurs when the data contains column labels (i.e. in a dictionary), which is what leads to the empty rows, rather than any overwriting. When there are no column labels in the data, the provided column labels are taken as the labels for the data.
