CSV without unique headers to list of dicts with unique keys-CodePudding

I have a CSV file with headers on row 0. The headers are often unique but sometimes they are not, for "comments" in this example. For each of several comments, the header is "Comment".

The problem with my function that makes dicts from CSVs is that it only returns the last column of Comment.

def csv_to_list_with_dicts(csvfile):
    with open(csvfile) as f:
        list_of_issues = [{k: v for k, v in row.items()}
                          for row in csv.DictReader(f, skipinitialspace=True)]
    return list_of_issues

My CSV file columns are like this:

User;ID;Comment;Comment;Comment

If one of the headers is repeating, I need to add an index to make it unique (like Comment1;Comment2 without changing the CSV) in the dict or all comments included under just Comment.

CodePudding user response：

This did return just the way I wanted. Just tweaked yours a small bit Happy Ahmad! HUGE THANKS!!! <3

def csv_to_list_with_dicts(csvfile):
    with open(csvfile, "r") as file:

        keys = file.readline().split(",")
        alteredKeys = []
        for eachKey in keys:
            counter = 0
            while(eachKey in alteredKeys):
                counter  = 1
                eachKey = eachKey[:len(eachKey)-(0 if counter == 1 else 1)]   str(counter)
            alteredKeys.append(eachKey)

        list_of_issues = []
        reader = csv.reader(file, delimiter=',', skipinitialspace=True)
        for eachLine in reader:
            eachIssue = dict()
            columnIndex = 0
            for eachColumn in eachLine:
                if columnIndex < len(alteredKeys):
                    eachIssue[alteredKeys[columnIndex]] = eachColumn
                    columnIndex  = 1
            list_of_issues.append(eachIssue)
    return list_of_issues

CodePudding user response：

In this solution, I use an alterKey list that changes any repeated key in the header by adding an index at its end. Then, I iterate on the other lines of the CSV file and make a dictionary from each one.

def csv_to_list_with_dicts(csvfile):
    with open(csvfile, "r") as file:    

        keys = file.readline().split(";")
        alteredKeys = []
        for eachKey in keys:
            counter = 0
            while(eachKey in alteredKeys):
                counter  = 1
                eachKey = eachKey[:len(eachKey)-(0 if counter == 1 else 1)]   str(counter)
            alteredKeys.append(eachKey)

        list_of_issues = []
        for eachLine in file:
            eachIssue = dict()
            columnIndex = 0
            for eachColumn in eachLine.split(";")
                if columnIndex < len(alteredKeys):
                    eachIssue[alteredKeys[columnIndex]] = eachColumn
                columnIndex  = 1
            list_of_issues.append(eachIssue)
    return list_of_issues

CodePudding user response：

It woujld be fairly easy to write code that will automatically generate unique keys for you by simply keeping track of those already seen and generating a unique name for any encountered that conflicted with one before it. Checking for that would be quick if those seen were kept in a set which features fast membership testing.

For example, assume this was in a CSV file named non-unique.csv:

User;ID;Comment;Comment;Comment
Jose;1138;something1;something2;something3
Gene;2907;abc;def;ghi
Guido;6450;jkl;mno;pqr

Code:

import csv


def csv_to_list_with_dicts(csv_filename):
    # Read the first row of the csv file.
    with open(csv_filename, encoding='utf-8', newline='') as csv_file:
        reader = csv.reader(csv_file, delimiter=';', skipinitialspace=True)
        names = next(reader)  # Header row.

    # Create list of unique fieldnames for the namee in the header row.
    seen = set()
    fieldnames = []
    for i, name in enumerate(names):
        if name in seen:
            name = f'_{i}'
        else:
            seen.add(name)
        fieldnames.append(name)

    # Read entire file and make each row a dictionary with keys based on the fieldnames.
    with open(csv_filename, encoding='utf-8', newline='') as csv_file:
        reader = csv.DictReader(csv_file, fieldnames=fieldnames, delimiter=';', 
                                skipinitialspace=True)
        next(reader)  # Ignore header row.
        return list(reader)


results = csv_to_list_with_dicts('non-unique.csv')

from pprint import pprint
pprint(results, sort_dicts=False, width=120)

Results:

[{'User': 'Jose', 'ID': '1138', 'Comment': 'something1', '_3': 'something2', '_4': 'something3'},
 {'User': 'Gene', 'ID': '2907', 'Comment': 'abc', '_3': 'def', '_4': 'ghi'},
 {'User': 'Guido', 'ID': '6450', 'Comment': 'jkl', '_3': 'mno', '_4': 'pqr'}]