编辑距离,又称为Levenshtein距离,由俄罗斯的数学家Vladimir Levenshtein在1965年提出。是指利用字符操作,把字符串A转换成字符串B所需要的最少操作数。
最小编辑距离涉及三种操作:
- 删除字符
- 插入字符
- 替换字符
状态转移方程:
实现代码:
# coding: utf-8
import numpy as np
"""
最小编辑距离
@author:liuchongming
@date: 2018-05-19
"""
def minimum_edit_distance(str1, str2):
row_count = len(str1) + 1
col_count = len(str2) + 1
dp = np.zeros(shape=[row_count, col_count], dtype=np.int)
for i in range(col_count):
dp[0][i] = i
for i in range(row_count):
dp[i][0] = i
for i in range(1, row_count):
for j in range(1, col_count):
if str1[i - 1] == str2[j - 1]:
flag = 0
else:
flag = 1
dp[i][j] = min(dp[i - 1][j - 1] + flag, dp[i - 1][j] + 1, dp[i][j - 1] + 1)
# 打印dp数组
for i in range(row_count):
for j in range(col_count):
print("%d " % dp[i][j], end='')
print()
return dp[row_count - 1][col_count - 1]
if __name__ == "__main__":
str1 = "coffee"
str2 = "cafe"
print(minimum_edit_distance(str1, str2))