What are eigengenes and gene modules?

最近在学习WGCNA时,遇到几个名词不是太懂是什么意思,在网上找了一下,发现有个答案但是正常上网是进不去,所以我就把他拿过来共有需要的人学习学习

one people

I am not sure in what context you are referring to the terms 'eigengene' and 'gene module'. But my best guess is you are talking about it in the context of WGCNA(Weighted Gene Co-expression Network Analysis).

If you want to do a wgcna analysis on a gene expression dataset, the general principle is, first, you build a correlation network between the genes based on their co-expression where a gene is a node and you put an edge between 2 genes if it passes a set threshold of co-expression strength. Sometimes people build a Topological Overlap Matrix (TOM)[1]on top of the correlation network but you do not need to worry about that at the moment. After you get a network, you do hierarchical clustering[2]on the most connected genes. This is an unsupervised learning method where a tree is built up from bottom to top by connecting the 2 most nearest genes in terms of a distance that you decide. That way when your tree is built you will have a number of clusters where the genes are tightly connected.

After getting the tree, you cut the tree at a certain distance, again why and how you do that is beautifully explained on the number 2 reference I provided. After cutting the tree, you get a number of modules where the genes are highly connected and may provide biological insights. These modules are called "gene modules".

When you want to compare one gene module against another, it can be advantageous to take only a representative of that module rather than taking all the genes. That is when you do a Principal Component Analysis[3] which can reduce your data meaningfully and then you take the first principal component as a summary of that module. This first principal component is called "eigengene" in this context.

You can find all of the necessary terminology regarding wgcna here[4]. A brilliant tutorial with every step of the WGCNA analysis can be found here[5]. It is written by the authors of the WGCNA R package.

another people

Genomic data such as gene expression data and variant data have very high dimensionality, i.e. there are too many variables, and few data points. When you have a gene expression dataset, you may be interested in identifying groups of genes which show similar expression patterns.

One of the ways to do this is WGCNA or weighted gene coexpression network analysis. In simple terms, what you're trying to do is identify genes which show similar expression patterns across samples or conditions. These gene groups are called modules. WGCNA identifies modules by using a type of Principle component analysis (PCA). Here, each module is represented by an expression value which belongs to the module 'eigengene'. This value is identified from the PCA. None of the actual genes in the module need to actually have this expression value.

Since each eigengene represents a module, the distance a gene from the eigengene, and therefore the centre of the module, can be calculated. This tells us which module each gene lies in.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,486评论 0 10
  • 很累,一觉到了晚饭时间。不知怎么的,今天我对父亲说“在你眼中妈妈是好老婆,好儿媳,我坚决不会成为妈妈那样的。” 父...
    涛声如旧阅读 285评论 0 0
  • 安装docker的先决条件 Docker可以安装在那些系统? 安装方式? 查看os版本 安装docker 将doc...
    liuchangng阅读 583评论 0 0
  • 春天 燕子说春天是什么? 小鸭子说春天是欢乐的时间, 小鸟说春天是欢乐的笑容, 小朋友说春天是放风筝的时间……
    悠记阅读 221评论 0 0
  • 真正的爱里,没有恐惧,没有牺牲,没有掌控,没有输赢。 真正的爱里,只有敞开,只有信任,只有成全,只有自由。 在流泪...
    娜娜是我阅读 242评论 0 0