进行GS前,需要将标记格式转成012格式,0表示低频allele的个数,即M矩阵。在R中有包
synbreed
和SNPassoc
能够完成这项工作。
library(tidyverse)
library(synbreed)
library(SNPassoc)
# 准备数据
data("SNPs")
pheno <- SNPs %>% select(id, blood.pre, protein)
genos <- SNPs[,-c(1:5)] %>% `rownames<-`(pheno$id)
pheno$id <- factor(pheno$id)
# synbreed
G_syn <- create.gpData(geno = as.matrix(genos)) %>% codeGeno() %>% magrittr::extract2("geno")
# snp10001 snp100010 snp100011 snp100012 snp100013
# 1 0 0 0 0 0
# 2 0 0 0 1 0
# 3 0 0 2 0 0
# 4 1 0 0 0 0
# 5 0 0 0 0 0
# SNPassoc
G_snp <- setupSNP(geno,1:ncol(geno),sep="")%>% apply(2,additive)
# snp10001 snp10002 snp10003 snp10004 snp10005
# [1,] 2 2 0 0 2
# [2,] 2 1 0 0 1
# [3,] 2 2 0 0 2
# [4,] 1 2 0 0 2
# [5,] 2 1 0 0 2
这2个包的列的排列方式不同。转换模型也不同:SNPassoc
得到的不是M矩阵,而是相反的高频allele矩阵。这2zhong矩阵得到的加性关系矩阵也不同,因此随机效应值也是不一样的。我尝试用2减去G_snp,应该是可以,但分子关系矩阵依然有差别,不再往下细究,建议还是用synbreed。
自编函数
genoTo012 <- function(genotype) {
allel_all <- genotype %>% str_split("") %>% unlist
allel_uni=unique(allel_all)
allel_tab=table(allel_all)
allel_min=names(allel_tab[which.min(allel_tab)])
allel_max=allel_uni[allel_uni!=allel_min]
geno_2=paste0(allel_min,allel_min)
geno_0=paste0(allel_max,allel_max)
genotype[which(genotype==geno_0)]=0
genotype[which(genotype==geno_2)]=2
genotype[which(genotype!=0&genotype!=2)]=1
genotype=as.numeric(genotype)
return(genotype)
}
G_self=apply(genos,2,genoTo012)
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_1.4.0 tidyr_1.1.4
[7] tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1 synbreed_0.12-14
loaded via a namespace (and not attached):
[1] httr_1.4.2 pkgload_1.2.4 jsonlite_1.7.2 foreach_1.5.1
[5] modelr_0.1.8 microbenchmark_1.4-7 assertthat_0.2.1 BiocManager_1.30.16
[9] cellranger_1.1.0 doBy_4.6.11 remotes_2.4.2 sessioninfo_1.2.2
[13] pillar_1.6.4 backports_1.4.1 lattice_0.20-38 glue_1.6.0
[17] rvest_1.0.2 colorspace_2.0-1 Matrix_1.2-17 pkgconfig_2.0.3
[21] devtools_2.4.3 broom_0.7.11 curry_0.1.1 haven_2.4.1
[25] scales_1.1.1 processx_3.5.2 qtl_1.48-1 generics_0.1.1
[29] usethis_2.1.5 ellipsis_0.3.2 cachem_1.0.4 regress_1.3-21
[33] withr_2.4.3 cli_3.1.0 readxl_1.3.1 magrittr_2.0.1
[37] crayon_1.4.2 memoise_2.0.1 ps_1.6.0 fs_1.5.0
[41] fansi_0.5.0 doParallel_1.0.14 MASS_7.3-51.4 xml2_1.3.2
[45] LDheatmap_0.99-7 pkgbuild_1.3.1 tools_3.6.0 prettyunits_1.1.1
[49] hms_1.1.1 lifecycle_1.0.1 reprex_2.0.1 munsell_0.5.0
[53] callr_3.7.0 Deriv_4.1.3 compiler_3.6.0 rlang_0.4.12
[57] grid_3.6.0 rstudioapi_0.13 iterators_1.0.11 igraph_1.2.6
[61] testthat_3.0.2 gtable_0.3.0 codetools_0.2-16 abind_1.4-7
[65] DBI_1.1.2 curl_4.3.1 R6_2.5.1 lubridate_1.7.10
[69] BGLR_1.0.4 fastmap_1.1.0 utf8_1.2.2 rprojroot_2.0.2
[73] desc_1.4.0 stringi_1.6.1 parallel_3.6.0 Rcpp_1.0.6
[77] vctrs_0.3.8 dbplyr_2.1.1 tidyselect_1.1.1