最近在做一项有关大鼠野生种与改良种的人工选择方向相关分析，涉及到混池测序的Fst计算。查阅文献找到一些可以对混池测序数据分析的已发表软件，如popoolation2 qtlseqr mutmap、mutmap+、qtl-seq; PoPoolation2 是通过比较两个混合群体得位点频率的经典pipeline，而这里笔者推荐一个2018年发表的R package--Poolfstat。

Poolfstat 使用

最近一篇NC中"A large and diverse autosomal haplotype is associated with sex-linked colour polymorphism in the guppy"提到了Poolfstat，其发表于2021年11月(引用文章),实现了对Pool-seq数据的FST估计，CRAN位置:Poolfstat. 继续查阅发现早在2018已首次被作者发表(文章链接)，2020年Poolfstat开始被广泛使用。如：

Kraft DW, Conklin EE, Barba EW, et al. Genomics versus mtDNA for resolving stock structure in the silky shark (Carcharhinus falciformis). PeerJ. 2020;8:e10186. Published 2020 Oct 21. doi:10.7717/peerj.10186
AssessPool then sends filtered SNPs to either PoPoolation2 (Kofler, Pandey & Schlötterer, 2011) or poolfstat (Hivert et al., 2018). PoPoolation2 calculates mean pairwise F_st values and significance in the form of p-values obtained using Fisher’s Exact Test and combined using Fisher’s method (as described in Ryman et al. (2006)). Poolfstat (Hivert et al., 2018) takes a different approach, calculating F_ST values based on an analysis-of-variance framework (sensu Weir & Cockerham, 1984) to eliminate biases associated with varying pool sizes. AssessPool then organizes, summarizes, and creates visualizations of the data using RStudio (RStudio Team, 2020).

Extensive Recombination Suppression and Epistatic Selection Causes Chromosome-Wide Differentiation of a Selfish Sex Chromosome in Drosophila pseudoobscura
To estimate population differentiation (F_ST), we used the R package poolfstat which implements the methods-of-moments estimator developed by Hivert et al. (2018) and includes a correction for pooled sequencing.

Poolfstat 说明书

https://cran.r-project.org/web/packages/poolfstat/poolfstat.pdf

记录

VCF文件 中149,155,260,270,691分别对应一个群体（图片来源网络），为群体的ID。
假设每个群体的ID为群体内样本数。

图片.png

#  多任务处理与并行运算
# library(foreach)
# library(doParallel)
# library(iterators)
# library(parallel)

library(poolfstat) 
pooldata <- vcf2pooldata(vcf.file='snp.vcf',poolsizes=c(149,155,260,270,691))#将VCF文件转换为pooldata对象
res.fst <- computeFST(pooldata,
  #method = "Anova",nsnp.per.bjack.block = 60,sliding.window.size = 60,
  verbose = TRUE) #根据池序列数据或计数数据计算所有群体的FST
# nsnp.per.bjack.block 按区域切块计算，设置区域内SNP数目；
# sliding.window.size 按滑动窗口计算，设置
compute.pairwiseFST(pooldata,method = "Anova",min.cov.per.pool = -1,max.cov.per.pool = 1e+06,min.indgeno.per.pop = -1,min.maf = -1,output.snp.values = FALSE,nsnp.per.bjack.block = 0,verbose = TRUE) #群体配对计算

write.csv(res.fst,file = "single.snp.fst",quote = F)

computeFST单点计算结果

注意：结果中的POS可能存在科学计数

图片.png

混池测序(Pool-Seq)Fst分析

混池测序(Pool-Seq)Fst分析

Poolfstat 使用

Poolfstat 说明书

记录

推荐阅读更多精彩内容