I have a JSON text file from Qualitrics that looks like this (for example, this is one variable I pulled from the text):
{
"SurveyID": "SV_8v79iA9BlgTnAnH",
"Element": "SQ",
"PrimaryAttribute": "QID7",
"SecondaryAttribute": "Do you use similar websites or resources to accomplish the objectives you have in using Open Data...",
"TertiaryAttribute": null,
"Payload": {
"QuestionText": "Do you use similar websites or resources to accomplish the objectives you have in using Open Data Flint?",
"DefaultChoices": false,
"DataExportTag": "Similar",
"QuestionID": "QID7",
"QuestionType": "MC",
"Selector": "SAVR",
"SubSelector": "TX",
"DataVisibility": {
"Private": false,
"Hidden": false
},
"Configuration": {
"QuestionDescriptionOption": "UseText"
},
"QuestionDescription": "Do you use similar websites or resources to accomplish the objectives you have in using Open Data...",
"Choices": {
"1": {
"Display": "Yes"
},
"2": {
"Display": "No"
}
},
"ChoiceOrder": [
1,
2
],
"Validation": {
"Settings": {
"ForceResponse": "ON",
"ForceResponseType": "ON",
"Type": "None"
}
},
"GradingData": [],
"Language": [],
"NextChoiceId": 3,
"NextAnswerId": 1
}
},
I want to extract text only from lines QuestionText and QuestionID so that it creates an output that looks exactly like this:
*
name = QID7
text =
Do you use similar websites or resources to accomplish the objectives you have in using Open Data Flint?
*
Here is my code so far but I'm getting an error that the list indices must be integers or slices, not str:
import json
with open('flint.json', 'r') as myfile:
data=myfile.read()
# parse file
obj = json.loads(data)
print("name = " str(obj["SurveyElements"]["Payload"]["QuestionID"]), "text = " str(obj["SurveyElements"]["Payload"]["QuestionText"]))
How can I create a Python script that will extract the information I want and output the results in the format I need so that the asterisks, 'name =', 'text =', line breaks, and clean text replicate the above output? Will I need to use regex to get what I need? Or apply multiple conditions per line until the conditions are satisfied?
CodePudding user response:
It definitely helped that you eventually added the structure of the json file.
After looking at your approach, I would just note that you need to account for the survey elements being in a list of dictionaries.
Below is an example to get what you want. I did assume you are only looking at survey questions, which is why I included the check: if element["Element"] == "SQ" (only these elements include QuestionID & QuestionText)
import json
# open json file
with open('flint.json', 'r') as myfile:
data = myfile.read()
# load json
obj = json.loads(data)
# create a list of dictionaries,
# that contains only the survey elements
survey_elements_list = obj["SurveyElements"]
# iterate through the list
# and only look at survey questions
# checking if element["Element"] == "SQ"
for element in survey_elements_list:
if element["Element"] == "SQ":
question_id = element["Payload"]["QuestionID"]
question_text = element["Payload"]["QuestionText"]
print("*")
print(f"name = {question_id}")
print(f"text =\n{question_text}")
CodePudding user response:
The QuestionText and QuestionID fields are nested within the Payload dictionary. You need to index into that dictonary before accessing those fields.
Your print should look like the following:
print(f"name = {obj["Payload"]["QuestionID"]}, text = {str(obj["Payload"]["QuestionText"]}")
Edit: The full JSON file shows more layers of nesting that we need to go through to access a specific field. Accessing these fields uses roughly the same format as above, but with a couple extra index accesses (I've also edited the response to use format-strings, rather than string concatenation):
for survey_element in obj["SurveyElements"]:
survey_element_payload = survey_element["Payload"]
if "QuestionID" in survey_element_payload and "QuestionText" in survey_element_payload:
print(f"name = {survey_elements_payload["QuestionID"]}, text = {survey_elements_payload["QuestionText"]}")
