I have a data frame that contains a dictionary-like format key-value pairs within a column. However, the dictionary is stored in a string format.
df1 = pd.DataFrame({'Name':['John', 'Jose'],
'Column2':['{"key": "John_value1", "key2": "John_value2"}','{"key": "Jose_value1", "key2": "Jose_value2"}'],
})
Since it is stored in a string format I can't access a particular element (eg. "Jose_value2") using the below code
df1["Column2"][1]["key2"]
TypeError: string indices must be integers
Can someone help me to do this?
CodePudding user response:
Your issue is '' encasing the keys will cause issues accessing the values.
df1 = pd.DataFrame({'Name':['John', 'Jose'],
'Column2':[{"key": "John_value1", "key2": "John_value2"},{"key": "Jose_value1", "key2": "Jose_value2"}],
})
print(df1['Column2'][1]['key2'])
Outputs:
Jose_value2
CodePudding user response:
Just managed to do it by removing quotes from the list of dictionaries using ast.literal_eval
import ast
df1["Column3"] = [ ast.literal_eval(i) for i in df1["Column2"]]
df1["Column3"][1]["key2"]
'Jose_value2'
CodePudding user response:
add:
import json
key2_value = json.loads(df1["Column2"][1]).get("key2")
This converts the string into a dictionary and then gets the value you want.
You can look at json.loads() in detail on the python documentation although I prefer looking at this link: https://www.geeksforgeeks.org/python-convert-string-dictionary-to-dictionary/
