Home > Mobile >  API to get edits that transform string A to string B in java
API to get edits that transform string A to string B in java

Time:02-02

What is the best API to get the edits required to transform string A to string B

A: This is a sentence
B: This was a sentences

Something like this
Edits:

  1. { start: 5, end: 7, action: "replace" text: "was" }
  2. { start: 18, action: "insert", text: "s" }

I know that I can do this on my own using Levenshtein distance algo, but I want to go with an industry-standard If someone knows about any NLP/String Util or API that I can use in java, please help me out.

Spacey errant is an option, but that is for python. I am looking for something similar to that in java.

CodePudding user response:

Option 1

Use Diff Match Patch library. It offers the following features:

  1. Diff: Compare two blocks of plain text and efficiently return a list of differences
  2. Match: Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.
  3. Patch: Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.

Example based on this:

import java.util.LinkedList;
import name.fraser.neil.plaintext.diff_match_patch;

public class Test {
  public static void main(String args[]) {
    diff_match_patch dmp = new diff_match_patch();
    LinkedList<diff_match_patch.Diff> diff = dmp.diff_main("This is a sentence", "This was a sentences");
    dmp.diff_cleanupSemantic(diff);
    System.out.println(diff);
  }
}

Output:

[Diff(EQUAL,"This "), Diff(DELETE,"i"), Diff(INSERT,"wa"), Diff(EQUAL,"s a sentence"), Diff(INSERT,"s")]

Option 2

Use Diff Utils library, which is an actively maintained fork of java-diff-utils from Google Code Archive. Diff Utils allows comparison operations between texts: computing diffs, applying patches, etc.

Example (for more examples have a look here):

import com.github.difflib.text.*;
import java.util.*;

public class Main {

    public static void main(String[] args) {
        // create a configured DiffRowGenerator
        DiffRowGenerator generator = DiffRowGenerator.create().showInlineDiffs(true).mergeOriginalRevised(true)
                .inlineDiffByWord(true).oldTag(f -> "~") // introduce markdown style for strikethrough
                .newTag(f -> "**") // introduce markdown style for bold
                .build();

        // compute the differences for two test texts.
        List<DiffRow> rows = generator.generateDiffRows(Arrays.asList("This is a sentence"),
                Arrays.asList("This was a sentences"));

        System.out.println(rows.get(0).getOldLine());
    }
}

Output:

This ~is~**was** a ~sentence~**sentences**

Option 3

Create your own Diff utility, such as the one described here. The code below is as given in the above source, and uses Longest Common Subsequence (LCS) - which is a variation of Edit distance (Levenshtein distance) - to solve this problem.

Example:

public class Main {
    // Function to display the differences between two strings
    public static void diff(String X, String Y, int m, int n, int[][] lookup) {
        // if the last character of `X` and `Y` matches
        if (m > 0 && n > 0 && X.charAt(m - 1) == Y.charAt(n - 1)) {
            diff(X, Y, m - 1, n - 1, lookup);
            System.out.print(" "   X.charAt(m - 1));
        }

        // if the current character of `Y` is not present in `X`
        else if (n > 0 && (m == 0 || lookup[m][n - 1] >= lookup[m - 1][n])) {
            diff(X, Y, m, n - 1, lookup);
            System.out.print("  "   Y.charAt(n - 1));
        }

        // if the current character of `X` is not present in `Y`
        else if (m > 0 && (n == 0 || lookup[m][n - 1] < lookup[m - 1][n])) {
            diff(X, Y, m - 1, n, lookup);
            System.out.print(" -"   X.charAt(m - 1));
        }
    }

    // Function to fill the lookup table by finding the length of LCS
    // of substring X[0…m-1] and Y[0…n-1]
    public static int[][] findLCS(String X, String Y, int m, int n) {
        // lookup[i][j] stores the length of LCS of substring X[0…i-1] and Y[0…j-1]
        int[][] lookup = new int[X.length()   1][Y.length()   1];

        // first column of the lookup table will be all 0
        for (int i = 0; i <= m; i  ) {
            lookup[i][0] = 0;
        }

        // first row of the lookup table will be all 0
        for (int j = 0; j <= n; j  ) {
            lookup[0][j] = 0;
        }

        // fill the lookup table in a bottom-up manner
        for (int i = 1; i <= m; i  ) {
            for (int j = 1; j <= n; j  ) {
                // if current character of `X` and `Y` matches
                if (X.charAt(i - 1) == Y.charAt(j - 1)) {
                    lookup[i][j] = lookup[i - 1][j - 1]   1;
                }
                // otherwise, if the current character of `X` and `Y` don't match
                else {
                    lookup[i][j] = Integer.max(lookup[i - 1][j], lookup[i][j - 1]);
                }
            }
        }

        return lookup;
    }

    // Implement diff utility in Java
    public static void main(String[] args) {
        String X = "This is a sentence";
        String Y = "This was a sentences";

        // lookup[i][j] stores the length of LCS of substring X[0…i-1] and Y[0…j-1]
        int[][] lookup = findLCS(X, Y, X.length(), Y.length());

        // find the difference
        diff(X, Y, X.length(), Y.length(), lookup);
    }
}

Output:

 T h i s   -i  w  a s   a   s e n t e n c e  s
  •  Tags:  
  • Related