实操|rrBLUP包RRBLUP

数据处理

VCF转为 rrBLUP {-1,0,1} 格式

rrBLUP可识别的基因型格式为 {-1,0,1} (行头为marker,列为sample),因此需要对基本数据处理转换;

编码G矩阵计算时, 有不同的编码形式,如下:

  • 0,1,2; 即AA是0, 表示major基因, 1 表示杂合, 2表示aa(minor).
  • -1, 0, 1; 即-1是AA, 表示major基因型, 0表示杂合, 1表示aa(minor).
## vcftools 生成{ 0,1,2} 矩阵    
vcftools --vcf test.genotypes_no_missing_IDs.vcf --012 --out snp_matrix 
  • --012
    This option outputs the genotypes as a large matrix. Three files are produced. The first, with suffix ".012", contains the genotypes of each individual on a separate line. Genotypes are represented as 0, 1 and 2, where the number represent that number of non-reference alleles. Missing genotypes are represented by -1. The second file, with suffix ".012.indv" details the individuals included in the main file. The third file, with suffix ".012.pos" details the site locations included in the main file.
##R    
data<-as.matrix(read.table("snp_matrix.012",header = F))
data1<-data[,-c(1)] #去列名
data2 <- data1 - 1 #0,1,2 转-1,0,1
write.table(mydata2, file="SNP_TMP.txt", row.names=FALSE, col.names=FALSE)#保存文件为纯数字的txt格式
##shell
cat SNP_TMP.txt | sed 's/-2/NA/g'  > snp.txt

文件输入

示例文件:
traits.txt: https://pbgworks.org/sites/pbgworks.org/files/traits.txt
snp.txt: https://pbgworks.org/sites/pbgworks.org/files/snp.txt

Pheno <- as.matrix(read.table(file ="/data4/ykzhang/chip_207/7GS/rrblup/format/sheep207_mvp.txt", header=TRUE))
Markers <- as.matrix(read.table(file="/data4/ykzhang/chip_207/7GS/rrblup/format/snp.txt"), header=F)

数据过滤和填充

impute = A.mat(Markers,max.missing=0.5,impute.method="mean",return.imputed=T)#按50%缺失值过滤,并按均值填充 
Markers_impute2 = impute$imputed

简单交叉验证

traits=1 
cycles=300 
accuracy = matrix(nrow=cycles, ncol=traits)
for(r in 1:cycles){
  train= as.matrix(sample(1:207, 180)) 
  test<-setdiff(1:207,train)
  Pheno_train=Pheno[train,]
  m_train=Markers_impute2[train,]
  Pheno_valid=Pheno[test,]
  m_valid=Markers_impute2[test,]

  yield=Pheno_train[,7]
  yield_answer<-mixed.solve(yield, Z=m_train, K=NULL, SE = FALSE, return.Hinv=FALSE)
  pred_yield_valid =  m_valid %*% as.matrix(yield_answer$u)
  pred_yield=pred_yield_valid[,1]+yield_answer$beta
  yield_valid = Pheno_valid[,7]
  accuracy[r,1] <-cor(pred_yield_valid, yield_valid, use="complete" )
  }
mean(accuracy)
多性状自动化计算

资料:

Introduction to Genomic Selection in R using the rrBLUP Package
【GS专栏】8-全基因组选择实战之RRBLUP

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容