在第二篇文章中的代码完成之后,照例还是对代码测试了一番。还是用两个相似的字符串,长度分别为1500和1507,结果能出来,但是效率差了点。在笔者的电脑上用了6秒中左右。仅仅是比较文本,就要6秒钟,比较难以接受,而且从代码看时间复杂度和空间复杂度都是O(n2)。
必须得改进!!!
在看了代码之后,发现代码运行速度慢可能出现在两个地方。一个是mDic对象,用的是Dictionary对象,在运行中反复读取和存储可能会影响速度,如果改为用数组可能效果会好点。哪位对这个有研究的同道,望不吝赐教。一个是String对象的Chars(Index)的方法。可能在每次执行到这一步时,会先把字符串转化为字符数组再返回一个字符,或者是遍历这个字符串,返回一个字符。对于本例中,大约需要执行1500×1500次,等于反复遍历,时间就浪费了。建议一开始就转化为字符数组,等到比较时就不需要遍历或转化了。
按照这两个思路对代码进行了修改,然后测试。效果很满意,本例测试几乎就是一瞬间。
现将代码赋予其后,用的是VB2005
Public Class clsCalculateStringDistanceEx2
Implements IDistance
Private mStringA() As Char
Private mStringB() As Char
Private mLenA As Integer
Private mLenB As Integer
Private mIsSame As Boolean
Private mDic(,) As Integer
Private Function CalculateStringDistance( _
ByVal StartLower As Integer,_
ByVal EndUper As Integer) As Integer
Dim i As Integer,j As Integer
Dim T1 As Integer,T2 As Integer,T3 As Integer
Dim jA As Integer = mLenA - StartLower - EndUper + 2
Dim jB As Integer = mLenB - StartLower - EndUper + 2
ReDim mDic(jA,jB)
mDic(jA,jB) = 0
For i = jA - 1 To 0 Step -1
mDic(i,jB) = jA - i
Next
For i = jB - 1 To 0 Step -1
mDic(jA,i) = jB - i
Next
For i = jA - 1 To 0 Step -1
For j = jB - 1 To 0 Step -1
If mStringA(i - 1 + StartLower) = _
mStringB(j - 1 + StartLower) Then
mDic(i,j) = mDic(i + 1,j + 1)
Else
T1 = mDic(i + 1,j)
T2 = mDic(i,j + 1)
T3 = mDic(i + 1,j + 1)
mDic(i,j) = Min(T1,T2,T3) + 1
End If
Next
Next
Return mDic(0,0)
End Function
Private Function Min(ByVal ParamArray M() As Integer) As Integer
Dim i As Integer,J As Integer
J = M(0)
For i = 1 To M.GetUpperBound(0)
If M(i) < J Then J = M(i)
Next
Return J
End Function
Public Function CalculateStringDistance() As Integer _
Implements IDistance.CalculateStringDistance
If mLenA = 0 Then Return mLenB
If mLenB = 0 Then Return mLenA
mIsSame = True
Dim i As Integer,j1 As Integer,j2 As Integer
For i = 1 To Min(mLenA,mLenB)
If mStringA(i - 1) <> mStringB(i - 1) Then
mIsSame = False
j1 = i
Exit For
End If
Next
If mIsSame = True Then Return Math.Abs(mLenA - mLenB)
For i = 1 To Min(mLenA,mLenB)
If mStringA(mLenA - i) <> mStringB(mLenB - i) Then
mIsSame = False
j2 = i
Exit For
End If
Next
If mIsSame = True Then Return Math.Abs(mLenA - mLenB)
Return CalculateStringDistance(j1,j2)
End Function
Public ReadOnly Property DicCount() As Integer _
Implements IDistance.DicCount
Get
Return mDic.Length
End Get
End Property
Public Sub SetString(ByVal S1 As String,ByVal S2 As String) _
Implements IDistance.SetString
mStringA = S1.tocharArray
mStringB = S2.tocharArray
mLenA = S1.Length
mLenB = S2.Length
End Sub
End Class