鸿蒙4.0Harmony NextOS余弦相似度

余弦相似度公式

image.png

1.分词(目前只分成了字符串)

[你,好]
[你,好,不,好]

2.词频计算

"你好" [你1, 好1]
"你好不好" [你1, 好2, 不1]

3.并集

[你1,好1, 不0]
[你1,好2,不1]

4.计算余弦

["你","好"]
["你","好","不","好"]
相似度1:0.8660254037844387
["你","好","不"]
["你","好","不","好"]
相似度2:0.9428090415820635
["你","好","不","好"]
["你","好","不","好"]
相似度3:1
["你","你","你","你"]
["你","好","不","好"]
相似度4:0.4082482904638631
["你"]
["你","好","不","好"]
相似度5:0.4082482904638631

附上代码ArkTS可直接运行,码砖不易,转载请标明出处

export function cosTextSimilarity(simple: string, target: string): number {
  //词语分割,暂时分割为单个字符串
  let simples: string[] = simple.split("")
  //Log.d("zb", JSON.stringify(simples))
  let targets: string[] = target.split("")
  //Log.d("zb", JSON.stringify(targets))

  //词频计算及存储
  let simpleMap: Map<string, number> = new Map<string, number>()
  simples.forEach((c: string, index: number) => {
    let value: number = simpleMap.get(c)?? 0
    simpleMap.set(c,  value + 1)
  })

  let targetMap: Map<string, number> = new Map<string, number>()
  targets.forEach((c: string, index: number) => {
    let value: number = targetMap.get(c)?? 0
    targetMap.set(c,  value + 1)
  })

  //并集数组
  let merge = simples.concat(targets)
  let collectionSet = new Set(merge)
  let collectionArr = new Array<string>()
  collectionSet.forEach((c: string) => {
    collectionArr.push(c)
  })

  let p3 = 0; let p1 = 0; let p2 = 0
  collectionArr.forEach((c: string, index: number) => {
    let frequencyS: number = simpleMap.get(c)?? 0
    let frequencyT: number = targetMap.get(c)?? 0

    p3 += frequencyS * frequencyT
    p1 += frequencyS * frequencyS
    p2 += frequencyT * frequencyT
  })

  return p3 *1.0 / Math.sqrt(p1 * p2)
}
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容