测量字符串的相同性(在Javascript中)

前端之家收集整理的这篇文章主要介绍了测量字符串的相同性(在Javascript中)前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
原则上这个问题可以解决与语言无关的问题,但具体来说我正在寻找一个 Javascript实现.

是否有任何库可以让我测量两个字符串的“相同性”?更一般地说,有没有任何算法可以实现这一点,我可以实现(在Javascript中)?

以下面的字符串为例

Abnormal Elasticity of Single-Crystal Magnesiosiderite across the Spin
Transition in Earth’s Lower Mantle

并且还要考虑以下,略微调整的字符串.请注意不同的粗体部分

bnormal Elasticity of Single Crystal Magnesio-Siderite across the Spin-Transition in Earths Lower Mantle.

Javascript的本机相等运算符不会告诉你很多关于这些字符串之间的关系.在这种特殊情况下,您可以使用正则表达式匹配字符串,但一般情况下只有在您知道期望的差异时才有效.如果输入字符串是随机的,则此方法的一般性会很快崩溃.

方法……我可以想象编写一个算法,将输入字符串分成任意数量的N个子串,然后将目标字符串与所有这些子字符串匹配,并使用匹配量作为相同度的度量.但这感觉就像一个没有吸引力的方法,我甚至不想考虑O有多大将取决于N.

在我看来,这种算法中有很多自由参数.例如,字符的区分大小写是否应该对测量的贡献与字符的顺序保存相同/更多/更少,似乎是设计者可以做出的任意选择,即:

identicality("Abxy","bAxy") versus identicality("Abxy","aBxy")

更具体地定义要求……
第一个例子是我可以使用它的场景.我正在加载一堆字符串(学术论文的标题),我检查我的数据库中是否有它们.但是,源可能包含拼写错误,约定,错误等等的差异,这使得匹配很难.在这个特定的场景中,可能有一种更简单的方法来匹配标题:因为你可以预期会出现什么问题,这可以让你写下一些正则表达式的野兽.

解决方法

您可以实现 Hirschberg’s algorithm并区分删除/插入操作(或更改Levenshtein).
对于Hirschbers(“Abxy”,“bAxy”) results are
It was 2 edit operations:
keep: 3
insert: 1
delete: 1

而对于Hirschbers(“Abxy”,“aBxy”)results are

It was 2 edit operations:
keep: 2
replace: 2

您可以在this page查看javascript实现.

‘最佳’字符串对齐距离

function optimalStringAlignmentDistance(s,t) {
  // Determine the "optimal" string-alignment distance between s and t
  if (!s || !t) {
    return 99;
  }
  var m = s.length;
  var n = t.length;
  
  /* For all i and j,d[i][j] holds the string-alignment distance
   * between the first i characters of s and the first j characters of t.
   * Note that the array has (m+1)x(n+1) values.
   */
  var d = new Array();
  for (var i = 0; i <= m; i++) {
    d[i] = new Array();
    d[i][0] = i;
  }
  for (var j = 0; j <= n; j++) {
    d[0][j] = j;
  }
        
  // Determine substring distances
  var cost = 0;
  for (var j = 1; j <= n; j++) {
    for (var i = 1; i <= m; i++) {
      cost = (s.charAt(i-1) == t.charAt(j-1)) ? 0 : 1;   // Subtract one to start at strings' index zero instead of index one
      d[i][j] = Math.min(d[i][j-1] + 1,// insertion
                         Math.min(d[i-1][j] + 1,// deletion
                                  d[i-1][j-1] + cost));  // substitution
                        
      if(i > 1 && j > 1 && s.charAt(i-1) == t.charAt(j-2) && s.charAt(i-2) == t.charAt(j-1)) {
        d[i][j] = Math.min(d[i][j],d[i-2][j-2] + cost); // transposition
      }
    }
  }
  
  // Return the strings' distance
  return d[m][n];
}

alert(optimalStringAlignmentDistance("Abxy","bAxy"))
alert(optimalStringAlignmentDistance("Abxy","aBxy"))

Damerau-Levenshtein距离

function damerauLevenshteinDistance(s,t) {
  // Determine the Damerau-Levenshtein distance between s and t
  if (!s || !t) {
    return 99;
  }
  var m = s.length;
  var n = t.length;      
  var charDictionary = new Object();
  
  /* For all i and j,d[i][j] holds the Damerau-Levenshtein distance
   * between the first i characters of s and the first j characters of t.
   * Note that the array has (m+1)x(n+1) values.
   */
  var d = new Array();
  for (var i = 0; i <= m; i++) {
    d[i] = new Array();
    d[i][0] = i;
  }
  for (var j = 0; j <= n; j++) {
    d[0][j] = j;
  }
  
  // Populate a dictionary with the alphabet of the two strings
  for (var i = 0; i < m; i++) {
    charDictionary[s.charAt(i)] = 0;
  }
  for (var j = 0; j < n; j++) {
    charDictionary[t.charAt(j)] = 0;
  }
  
  // Determine substring distances
  for (var i = 1; i <= m; i++) {
    var db = 0;
    for (var j = 1; j <= n; j++) {
      var i1 = charDictionary[t.charAt(j-1)];
      var j1 = db;
      var cost = 0;
      
      if (s.charAt(i-1) == t.charAt(j-1)) { // Subtract one to start at strings' index zero instead of index one
        db = j;
      } else {
        cost = 1;
      }
      d[i][j] = Math.min(d[i][j-1] + 1,// deletion
                                  d[i-1][j-1] + cost)); // substitution
      if(i1 > 0 && j1 > 0) {
        d[i][j] = Math.min(d[i][j],d[i1-1][j1-1] + (i-i1-1) + (j-j1-1) + 1); //transposition
      }
    }
    charDictionary[s.charAt(i-1)] = i;
  }
        
  // Return the strings' distance
  return d[m][n];
}

alert(damerauLevenshteinDistance("Abxy","aBxy"))
alert(damerauLevenshteinDistance("Abxy","bAxy"))

Optimal String Alignment has better 07006

Optimal String Alignment Distance 0.20-0.30ms
Damerau-Levenshtein Distance 0.40-0.50ms

猜你在找的JavaScript相关文章