fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets

注:仅仅取用文章中我认为重要的部分做出翻译。


Choice of K

One question that arises when applying admixture models in practice is how to select the model complexity, or number of populations, K.
译:实际中我们应用admixture models时,会出现一个问题,那就是如何去选择the model complexity(即群体数目)K。
It is important to note that in practice there will generally be no “true” value of K, because samples from real populations will never conform exactly to the assumptions of the model.
译:需要重点注意的是,在实践中,通常不会有“真实”的K值,因为真实群体中的样本永远不会符合模型的假设。(应该说模型假设永远不会与群体真实情况一致)
Further, inferred values of K could be influenced by sampling ascertainment schemes (Engelhardt and Stephens 2010)(imagine sampling from g distinct locations in a continuous habitat exhibiting isolation by distance—any automated approach to select K will be influenced by g),and by the number of typed loci (as more loci are typed, more subtle structure can be picked up, and inferred values of K may increase).
译:此外,抽样方案也会影响K值的推断(想象一下,依据距离从一个连续的生态环境中g个不同的地点取样,而这些地点个体都是存在隔离的--任何自动选择K的方法都会受g的影响),输入基因座的数量也会影响K值的推断(随着输入位点的增加,可能会寻找到更多精细的结构,并且推断出的K值也可能增加)。
Nonetheless, it can be helpful to have automated heuristic rules to help guide the analyst in making the appropriate choice for K, even if the resulting inferences need to be carefully interpreted within the context of prior knowledge about the data and sampling scheme.
译:尽管如此,利用相应的规则,自动选择出合适的K,这对分析人员还是有帮助的,即使还需要在有关数据和采样方案的先验知识的背景下,仔细地解释得出的结果。
Therefore, we here used simulation to assess several different heuristics for selecting K.
译:因此,我们利用模拟数据去评估几种选择K的方法。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

友情链接更多精彩内容