Clustering

We perform clustering because we believe the underlying cluster labels are meaningful, will lead to a more efficient description of our data, and will help us choose better actions.

K-means Clustering

The K-means algorithm is an algorithm for putting N data points in an I-dimensional space into K clusters. Each cluster is parameterized by a vector m(k) called its mean.

Each data point is denoted by x(n), which is consisted of I components.

Distance between data points are defined, such as:

Simply two steps

  • assignment step: Each data point x(n) is assigned to the nearest mean.
  • update step: The means are adjusted to match the sample means of the data points that they are responsible for.

After iterations of the two steps, this algorithm will definitely converge. (This can be proved.) The convergence is indicated by the means remaining unmoved when updated.

Withdraw about K-means: It is a 'hard' algorithm. 'Hard' means that it assigns each data point to exactly one cluster, and all data points in a cluster are equal in updating the mean. Maybe points on the borderline of two or more clusters should have less vote in updating step.

Soft K-means clustering

The bad things about 'hard' K-means algorithm gives rise to the soft K-means algorithm.

This algorithm still has some flaws. It is hopefully improved by using maximum-likelihood.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • (你写的什么)+(你期望得到什么结果)+(现在是什么结果)+(经过了哪些方式解决问题)+(尽量附上截图)
    Deecyn阅读 119评论 0 0
  • 01 看得到的进步 把进步可视化,把你的进步以看得见的方式记录下来,如逐渐消灭的错题集,一件件完成的待办清单。 看...
    小阳_7261阅读 159评论 0 4
  • 在夕阳的余晖里,银杏叶是那么的俏皮可爱,但在他看来,这银杏叶却透露出来了无尽的憔悴。又一次踏上这条铺满银杏叶的大道...
    潇湘穆儿阅读 305评论 0 1
  • 心魔是心里的魔鬼,它完全暴露内心的负面。一旦心被它占据了,人就像被异形黑化了一样,完全过上了不受控制的生活。 譬如...
    AliciaWandervog阅读 468评论 0 0