Home > Blockchain >  Fuzzy matching in Google Sheets
Fuzzy matching in Google Sheets

Time:02-05

Trying to compare two columns in GoogleSheets with this formula in Column C:

=if(A1=B1,"","Mismatch")

Works fine, but I'm getting a lot of false positives:

A. B C
MARY JO Mary Jo
JAY, TIM TIM JAY Mismatch
Sam Ron Sam Ron Mismatch
Jack *Ma Jack MA Mismatch

Any ideas how to work this?

CodePudding user response:

try:

=ARRAYFORMULA(IFERROR(IF(LEN(
 REGEXREPLACE(REGEXREPLACE(LOWER(A1:A), "[^a-z ]", ), 
 LOWER("["&B1:B&"]"), ))>0, "mismatch", )))

enter image description here

CodePudding user response:

Implementing fuzzy matching via Google Sheets formula would be difficult. I would recommend using a custom formula for this one or a full blown script (both via Google Apps Script) if you want to populate all rows at once.

Custom Formula:

function fuzzyMatch(string1, string2) {
  string1 = string1.toLowerCase()
  string2 = string2.toLowerCase();
  var n = -1;

  for(i = 0; char = string2[i]; i  )
    if (!~(n = string1.indexOf(char, n   1))) 
      return 'Mismatch';
};

What this does is compare if the 2nd string's characters order is found in the same order as the first string. See sample data below for the case where it will return mismatch.

Output:

output

Note:

  • Last row is a mismatch as 2nd string have r in it that isn't found at the first string thus correct order is not met.
  • If this didn't meet your test cases, add a more definitive list that will show the expected output of the formula/function so this can be adjusted, or see player0's answer which solely uses Google Sheets formula and is less stricter with the conditions.

Reference:

  •  Tags:  
  • Related