暴力匹配(brute-force substring search)就是用模式的字符串去和目标文本的字符串,一个一个字符的匹配过去。时间复杂性为 O(m×n)。
这里两种 Python 实现方法,思路大同小异,都是沿着目标文本做遍历,同时有一个小循环对模式字符串做遍历。如果模式字符串能和目标文本的子字符串一一对应,就返回目标文本的这个位置的下标。两种方法都使用了两个针对目标文本字符串的指针i,以及模式字符串的指针j。区别是外层指针是否随着内层指针一起前进。
方法1
两个 while 循环嵌套,参考Algorithms 4th, Robert Sedgewick, page 760。在内层指针前进时,外层指针不动,直到内从指针循环完以后才前进一位。
def brute_force_search(txt, pattern):
N = len(txt)
M = len(pattern)
i = 0 # pointer into the text
while i <= (N - M):
j = 0 # pointer into the patter
while j < M:
if txt[i+j] != pattern[j]:
break
j += 1
if j == M:
return i
i += 1
return -1
def main():
txt = "abcde"
pat = "cde"
result = brute_force_search(txt, pat)
print "result:", result
txt_another = "abcde"
pat_another = "123"
result_another = brute_force_search(txt_another, pat_another)
print "result_another:", result_another
if __name__ == '__main__':
main()
结果是
result: 2
result_another: -1
方法2
只有一个 while,但也是控制两个指针。如果不匹配,目标文本字符串的指针向前走一位,而模式字符串的指针会到下标 0 的位置。这种方法相当于上一种方法的改进型,特点是两个指针一起前进。
参考数据结构与算法:Python语言描述,第114页。但是这里做了修改,查找到以后立即就做了返回,而不是书上遍历完以后才返回。
def naive_search(txt, pattern):
N = len(txt)
M = len(pattern)
i = 0 # pointer into text
j = 0 # pointer into pattern
while i < N and j < M:
if txt[i] == pattern[j]:
i += 1
j += 1
if j == M:
return (i-j)
else:
i = i - j + 1
j = 0
return -1
def main():
txt = "abcde"
pat = "cde"
naive_result = naive_search(txt, pat)
print "naive_result:", naive_result
txt_another = "abcde"
pat_another = "123"
naive_result_another = naive_search(txt_another, pat_another)
print "naive_result_another:", naive_result_another
if __name__ == '__main__':
main()
结果是
naive_result: 2
naive_result_another: -1
书上的示例代码也是一样的效果,区别是把 if i == m:
放在循环外面。实际上,当匹配到字符串的时候,程序会跳出 while 循环,所以结果一致。
def naive_matching(t, p):
j, i = 0, 0
n, m = len(t), len(p)
while j < n and i < m: # i==m means a matching
if t[j] == p[i]: # ok! consider next char in p
j, i = j+1, i+1
else: # no! consider next position in t
j, i = j-i+1, 0
if i == m: # find a matching, return its index
return j-i
return -1 # no matching, return special value
def main():
txt = "abcde"
pat = "cde"
naive_matching_result = naive_matching(txt, pat)
print "naive_matching_result:", naive_matching_result
txt_another = "abcde"
pat_another = "123"
naive_matching_result_another = naive_matching(txt_another, pat_another)
print "naive_matching_result_another:", naive_matching_result_another
if __name__ == '__main__':
main()