Extracting text from a JSON file into specified output using Python-CodePudding

I have a JSON text file from Qualitrics that looks like this (for example, this is one variable I pulled from the text):

{
      "SurveyID": "SV_8v79iA9BlgTnAnH",
      "Element": "SQ",
      "PrimaryAttribute": "QID7",
      "SecondaryAttribute": "Do you use similar websites or resources to accomplish the objectives you have in using Open Data...",
      "TertiaryAttribute": null,
      "Payload": {
        "QuestionText": "Do you use similar websites or resources to accomplish the objectives you have in using Open Data Flint?",
        "DefaultChoices": false,
        "DataExportTag": "Similar",
        "QuestionID": "QID7",
        "QuestionType": "MC",
        "Selector": "SAVR",
        "SubSelector": "TX",
        "DataVisibility": {
          "Private": false,
          "Hidden": false
        },
        "Configuration": {
          "QuestionDescriptionOption": "UseText"
        },
        "QuestionDescription": "Do you use similar websites or resources to accomplish the objectives you have in using Open Data...",
        "Choices": {
          "1": {
            "Display": "Yes"
          },
          "2": {
            "Display": "No"
          }
        },
        "ChoiceOrder": [
          1,
          2
        ],
        "Validation": {
          "Settings": {
            "ForceResponse": "ON",
            "ForceResponseType": "ON",
            "Type": "None"
          }
        },
        "GradingData": [],
        "Language": [],
        "NextChoiceId": 3,
        "NextAnswerId": 1
      }
    },

I want to extract text only from lines QuestionText and QuestionID so that it creates an output that looks exactly like this:

*
name = QID7
text = 
Do you use similar websites or resources to accomplish the objectives you have in using Open Data Flint?                          
*

Here is my code so far but I'm getting an error that the list indices must be integers or slices, not str:

import json

with open('flint.json', 'r') as myfile:
    data=myfile.read()

# parse file
obj = json.loads(data)

print("name = "   str(obj["SurveyElements"]["Payload"]["QuestionID"]), "text = "   str(obj["SurveyElements"]["Payload"]["QuestionText"]))

How can I create a Python script that will extract the information I want and output the results in the format I need so that the asterisks, 'name =', 'text =', line breaks, and clean text replicate the above output? Will I need to use regex to get what I need? Or apply multiple conditions per line until the conditions are satisfied?

CodePudding user response：

It definitely helped that you eventually added the structure of the json file.

After looking at your approach, I would just note that you need to account for the survey elements being in a list of dictionaries.

Below is an example to get what you want. I did assume you are only looking at survey questions, which is why I included the check: if element["Element"] == "SQ" (only these elements include QuestionID & QuestionText)

import json

# open json file
with open('flint.json', 'r') as myfile:
    data = myfile.read()

# load json
obj = json.loads(data)

# create a list of dictionaries,
# that contains only the survey elements
survey_elements_list = obj["SurveyElements"]

# iterate through the list
# and only look at survey questions
# checking if element["Element"] == "SQ"
for element in survey_elements_list:
    if element["Element"] == "SQ":
        question_id = element["Payload"]["QuestionID"]
        question_text = element["Payload"]["QuestionText"]
        print("*")
        print(f"name = {question_id}")
        print(f"text =\n{question_text}")

CodePudding user response：

The QuestionText and QuestionID fields are nested within the Payload dictionary. You need to index into that dictonary before accessing those fields.

Your print should look like the following:

print(f"name = {obj["Payload"]["QuestionID"]}, text = {str(obj["Payload"]["QuestionText"]}")

Edit: The full JSON file shows more layers of nesting that we need to go through to access a specific field. Accessing these fields uses roughly the same format as above, but with a couple extra index accesses (I've also edited the response to use format-strings, rather than string concatenation):

for survey_element in obj["SurveyElements"]:
    survey_element_payload = survey_element["Payload"]
    if "QuestionID" in survey_element_payload and "QuestionText" in survey_element_payload:
        print(f"name = {survey_elements_payload["QuestionID"]}, text = {survey_elements_payload["QuestionText"]}")