最近在做一项有关大鼠野生种与改良种的人工选择方向相关分析,涉及到混池测序的Fst计算。查阅文献找到一些可以对混池测序数据分析的已发表软件,如popoolation2 qtlseqr mutmap、mutmap+、qtl-seq; PoPoolation2 是通过比较两个混合群体得位点频率的经典pipeline,而这里笔者推荐一个2018年发表的R package--Poolfstat。
Poolfstat 使用
最近一篇NC中"A large and diverse autosomal haplotype is associated with sex-linked colour polymorphism in the guppy"提到了Poolfstat,其发表于2021年11月(引用文章),实现了对Pool-seq数据的FST估计,CRAN位置:Poolfstat. 继续查阅发现早在2018已首次被作者发表(文章链接),2020年Poolfstat开始被广泛使用。如:
Kraft DW, Conklin EE, Barba EW, et al. Genomics versus mtDNA for resolving stock structure in the silky shark (Carcharhinus falciformis). PeerJ. 2020;8:e10186. Published 2020 Oct 21. doi:10.7717/peerj.10186
AssessPool then sends filtered SNPs to either PoPoolation2 (Kofler, Pandey & Schlötterer, 2011) or poolfstat (Hivert et al., 2018). PoPoolation2 calculates mean pairwise Fst values and significance in the form of p-values obtained using Fisher’s Exact Test and combined using Fisher’s method (as described in Ryman et al. (2006)). Poolfstat (Hivert et al., 2018) takes a different approach, calculating FST values based on an analysis-of-variance framework (sensu Weir & Cockerham, 1984) to eliminate biases associated with varying pool sizes. AssessPool then organizes, summarizes, and creates visualizations of the data using RStudio (RStudio Team, 2020).
Extensive Recombination Suppression and Epistatic Selection Causes Chromosome-Wide Differentiation of a Selfish Sex Chromosome in Drosophila pseudoobscura
To estimate population differentiation (FST), we used the R package poolfstat which implements the methods-of-moments estimator developed by Hivert et al. (2018) and includes a correction for pooled sequencing.
Poolfstat 说明书
https://cran.r-project.org/web/packages/poolfstat/poolfstat.pdf
记录
VCF文件 中149,155,260,270,691分别对应一个群体(图片来源网络),为群体的ID。
假设每个群体的ID为群体内样本数。
# 多任务处理与并行运算
# library(foreach)
# library(doParallel)
# library(iterators)
# library(parallel)
library(poolfstat)
pooldata <- vcf2pooldata(vcf.file='snp.vcf',poolsizes=c(149,155,260,270,691))#将VCF文件转换为pooldata对象
res.fst <- computeFST(pooldata,
#method = "Anova",nsnp.per.bjack.block = 60,sliding.window.size = 60,
verbose = TRUE) #根据池序列数据或计数数据计算所有群体的FST
# nsnp.per.bjack.block 按区域切块计算,设置区域内SNP数目;
# sliding.window.size 按滑动窗口计算,设置
compute.pairwiseFST(pooldata,method = "Anova",min.cov.per.pool = -1,max.cov.per.pool = 1e+06,min.indgeno.per.pop = -1,min.maf = -1,output.snp.values = FALSE,nsnp.per.bjack.block = 0,verbose = TRUE) #群体配对计算
write.csv(res.fst,file = "single.snp.fst",quote = F)
注意:结果中的POS可能存在科学计数