Home > Mobile >  Can every solution of the edit distance problem be shown on the matrix obtained from the dynamic pro
Can every solution of the edit distance problem be shown on the matrix obtained from the dynamic pro

Time:01-09

I watched a video on Youtube explaining how to calculate which operations are used and how many times by going back to 1 from the last cell of the matrix obtained from the dynamic programming algorithm. Although I understood the example in the video, when I tried to do it with other examples, I couldn't. Is it possible to show every solution on the matrix? I also added the code that calculates the matrix at the end.

Let's assume our words are "apple" and "paper".

Matrix obtained from dynamic programming algorithm (I used Levensthein Distance operations.):

Minimum edit distance of "apple" and "paper" is 4.

One of the solutions is 4 replace operations.

1. apple -> ppple (replace "a" with "p")

2. ppple -> paple (replace "p" with "a")

3. paple -> papee (replace "l" with "e")

4. papee -> paper (replace "e" with "r")

Solution on matrix:

enter image description here

Is it possible to show other solutions on this matrix?

For example:

1. apple -> papple (insert "p")

2. papple -> paple (remove "p")

3. paple -> pape (remove "l")

4. pape -> paper (insert "r")

code:

def min_edit_count(word1, word2):

    word1 = ' '   word1     #
    word2 = ' '   word2     # add a space before original words

    len_w1 = len(word1)     #
    len_w2 = len(word2)     # calculate the lengths of new words

    edit_matrix = np.zeros((len_w2, len_w1), dtype = int)
    # create a matrix with all zeros

    edit_matrix[0, :] = range(len_w1)  
    # assign numbers from 0 to len_w1 in the first row of the edit_matrix 
    edit_matrix[:, 0] = range(len_w2)
    # assign numbers from 0 to len_w2 in the first column of the edit_matrix 

    for i in range(1, len_w2):
        for j in range(1, len_w1):
            # edit_matrix[i-1][j] --> remove
            # edit_matrix[i][j-1] --> insert
            # edit_matrix[i-1][j-1] --> replace

            temp1 = edit_matrix[i-1][j]   1
            temp2 = edit_matrix[i][j-1]   1
            # add 1 to edit_matrix[i-1][j] and edit_matrix[i][j-1]

            temp3 = edit_matrix[i-1][j-1]   1 if word1[j] != word2[i] else edit_matrix[i-1][j-1]
            # if last characters are same don't add 1 to edit_matrix[i-1][j-1].
            # no need to replace

            edit_count = min(temp1, temp2, temp3)
            # find min between three numbers
            edit_matrix[i][j] = edit_count

    min_edit = int(edit_matrix[len_w2 - 1, len_w1 - 1])
    # minimum edit count is the last number calculated
    
    return min_edit, edit_matrix

CodePudding user response:

Yes, you can backtrack from the end going over the cells that could contribute to the solution.
The complexity would be O((n m) * num_solutions).

def getSolutions(edit_matrix, word1, word2):
  pos = [] 
  def backtrack(i,j):
    pos.append((i,j))
    if i==0 and j==0:
      # This is a solution
      print(pos)
    if i>0 and edit_matrix[i-1,j]   1 == edit_matrix[i,j]:
      backtrack(i-1,j)
    if j>0 and edit_matrix[i,j-1]   1 == edit_matrix[i,j]:
      backtrack(i, j-1)
    if i>0 and j>0 and (edit_matrix[i-1][j-1]   1 if word1[j] != word2[i] else edit_matrix[i-1][j-1]) == edit_matrix[i,j]:
      backtrack(i,j)
    pos.pop()
  backtrack(len(word1) - 1,len(word2) - 1)

CodePudding user response:

Based on Sorin's good answer, here is a slightly enhanced version that is fixed to fit your indexing and also prints the edit operations that need to be done in each case:

def get_solutions(edit_matrix, word1, word2):
  pos = []
  sol = []
  def backtrack(i,j):
    pos.insert(0, (i,j))
    if i==0 and j==0:
      # This is a solution
      print(str(pos)   ": "   str(sol))
    if i>0 and edit_matrix[i-1,j]   1 == edit_matrix[i,j]:
      sol.insert(0, "insert "   word2[i])
      backtrack(i-1,j)
      sol.pop(0)
    if j>0 and edit_matrix[i,j-1]   1 == edit_matrix[i,j]:
      sol.insert(0, "remove "   word1[j])
      backtrack(i, j-1)
      sol.pop(0);
    if i>0 and j>0 and edit_matrix[i-1][j-1]   1 if word1[j] != word2[i] else edit_matrix[i-1][j-1] == edit_matrix[i,j]:
      if (word1[j] != word2[i]):
        sol.insert(0, "replace "   word1[j]   " with "   word2[i])
      else:
        sol.insert(0, "skip")
      backtrack(i-1,j-1)
      sol.pop(0)
    pos.pop(0)
  word1 = ' '   word1
  word2 = ' '   word2
  backtrack(len(word2) - 1,len(word1) - 1)

An example output for

count, matrix = min_edit_count("apple", "paper")
get_solutions(matrix, "apple", "paper")

then looks as follows:

[(0, 0), (1, 0), (2, 1), (3, 2), (3, 3), (3, 4), (4, 5), (5, 5)]: ['insert p', 'skip', 'skip', 'remove p', 'remove l', 'skip', 'insert r']
[(0, 0), (0, 1), (1, 2), (2, 2), (3, 3), (3, 4), (4, 5), (5, 5)]: ['remove a', 'skip', 'insert a', 'skip', 'remove l', 'skip', 'insert r']
[(0, 0), (1, 0), (2, 1), (2, 2), (3, 3), (3, 4), (4, 5), (5, 5)]: ['insert p', 'skip', 'remove p', 'skip', 'remove l', 'skip', 'insert r']
[(0, 0), (1, 1), (2, 2), (3, 3), (3, 4), (4, 5), (5, 5)]: ['replace a with p', 'replace p with a', 'skip', 'remove l', 'skip', 'insert r']
[(0, 0), (0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 5)]: ['remove a', 'skip', 'replace p with a', 'replace l with p', 'skip', 'insert r']
[(0, 0), (1, 0), (2, 1), (3, 2), (4, 3), (5, 4), (5, 5)]: ['insert p', 'skip', 'skip', 'replace p with e', 'replace l with r', 'remove e']
[(0, 0), (1, 0), (2, 1), (3, 2), (4, 3), (4, 4), (5, 5)]: ['insert p', 'skip', 'skip', 'replace p with e', 'remove l', 'replace e with r']
[(0, 0), (1, 0), (2, 1), (3, 2), (3, 3), (4, 4), (5, 5)]: ['insert p', 'skip', 'skip', 'remove p', 'replace l with e', 'replace e with r']
[(0, 0), (0, 1), (1, 2), (2, 2), (3, 3), (4, 4), (5, 5)]: ['remove a', 'skip', 'insert a', 'skip', 'replace l with e', 'replace e with r']
[(0, 0), (1, 0), (2, 1), (2, 2), (3, 3), (4, 4), (5, 5)]: ['insert p', 'skip', 'remove p', 'skip', 'replace l with e', 'replace e with r']
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5)]: ['replace a with p', 'replace p with a', 'skip', 'replace l with e', 'replace e with r']

//fun: you can try to see why every line has the same length :D

  •  Tags:  
  • Related