Principal Component Analysis (PCA) - Part 1
mean-平均值
Variance-方差:Measure of the spread of the data
Covariance协方差Measure of the co-dependence of two random variables
特征值和特征向量,矩阵
Principal Component Analysis (PCA) - Part 2
▶PCA is a method of revealing underling trends in large amounts of data
▶A new coordinate system is constructed by rotating the axes
▶The first coordinate is the direction in which the data varies most, and so on…
▶Select a few new variables which contain most of the variation of the data, and can be visualized
Principal Component Analyis (PCA) Plotting in MATLAB
有MATLAB License截止到2017.11.27,一个PCA的示例
软件下载https://www.mathworks.com/licensecenter/classroom/netsysbio
软件学习https://matlabacademy.mathworks.com/
Clustergram in MATLAB
Hierarchical Clustering层次聚类
根据距离聚类 距离的计算包括Euclidean 、Correlation 、Hamming等,其中以Euclidean最常见。
下图中subtype1-3三种压型分开,红色是高表达的区域。上部的基因在左上角区域高表达、中部的基因在中部区域高表达、下部的基因在右下角区域高表达。
Linkage Function
有以下几种
Average
Median
Single
Complete
Standardization
▶Standardization convert data into standardized z-scores.
▶Standardization is a normalization process that forces the values to fall into the range that is most suitable to be visualized in a clustergram.
▶There are two standardize options: row standardization or column standardization.
▶For gene expression data we generally use row standardization.
一个基于matlab的聚类示例
Self-Organizing Maps
K-means
找一个或多个点,mk使得该点到其余点的距离最短。
Self-Organizing Maps
与k-means的区别是,它是三维的,适用于非线性的数据。
Network-Based Clustering
将网络进行聚类
Gephi可以看network
Popular Network Clustering Methods