python - Sort a list into a nested list by string match without sorted function-CodePudding

I'm wondering how to sort a list as such:

list_1 = ['apple', 'grape', 'orange', 'orange', 'apple', 'grape', 'apple']

to this:

list_1 = [['apple', 'apple', 'apple'], ['grape', 'grape'], ['orange', 'orange']]

Only using a raw algorithm rather than any sort functions.

The reason why is that I'll use this algorithm in conjunction with a function that matches strings based on how similar the text is because the strings I'm sorting aren't the same, and relying on any sorting function wouldn't allow me to use that function

Any guidance and help is greatly appreciated!

CodePudding user response：

You can use this kind of algorithm. First sort the list using any sorting algorithms, let us use bubble sort for now. Then count the frequency of all elements and store it in a dictionary and then and convert it back into a nested list. So you can do something like this:

def bubble_sort(ls):
    swapped = True
    while swapped:
        swapped = False
        for i in range(len(ls) - 1):
            if ls[i] > ls[i   1]:
                # Swap the elements
                ls[i], ls[i   1] = ls[i   1], ls[i]
                swapped = True
    return ls

def nest(ls):
    ls2 = {}
    res = []
    for i in ls:
        if i in ls2:
            ls2[i]  = 1
        else:
            ls2[i] = 1
    i = 0
    while i<len(ls):
        res.append([ls[i]]*ls2[ls[i]])
        i =ls2[ls[i]]
    return res

print(nest(bubble_sort(['apple', 'grape', 'orange', 'orange', 'apple', 'grape', 'apple'])))

CodePudding user response：

To sort this list you can run this code:

list_1 = ['apple', 'grape', 'orange', 'orange', 'apple', 'grape', 'apple']
list_1.sort()

dictionary = {}
for item in list_1:
    if item not in dictionary:
        dictionary[item] = 1
    else:
        dictionary[item]  = 1

list_1 = []
for word in dictionary:
    list_2 = [word]*dictionary[word]
    list_1.append(list_2)

print(list_1)

This outputs

[['apple', 'apple', 'apple'], ['grape', 'grape'], ['orange', 'orange']]

The second line of code is optional, you can remove it if you don't need the last list to be in alphabetical order.

CodePudding user response：

Assuming that your word similarity function is transitive, you can progressively add words to the group they belong to, adding new groups as you go for words that don't find any similarity:

def groupWords(L,isSimilar=lambda a,b:a==b):
    result = []
    for word in L:                         # for each word in list
        for group in result:               # find a similar word group
            if isSimilar(word,group[0]):   
                group.append(word)         # add to the group
                break
        else:
            result.append([word])          # new group if no match
    return result

output:

list_1 = ['apple', 'grape', 'orange', 'orange', 'apple', 'grape', 'apple']

list_2 = groupWords(list_1)

print(list_2)
[['apple', 'apple', 'apple'], ['grape', 'grape'], ['orange', 'orange']]

You can supply the function with your own similarity check function or make it part of the loop directly

Note that, if your similarity check is not transitive, words will go into the first matching group and the result will be dependent on the order of the input