这一节说的是association analysis 也是我们分析单标记回归得到的结果(P值)的一步,这一步的结果可以用来与我第一讲 联合起来,形成一个闭环。跑跑标准流程。
Association analysis 可以有很多用处,比如:
The basic association test is for a disease trait and is based on comparing allele frequencies between cases and controls (asymptotic and empirical p-values are available). Also implemented are the Cochran-Armitage trend test, Fisher’s exact test, di↵erent genetic models (dominant, recessive and general), tests for stratified samples (e.g. Cochran-Mantel-Haenszel, Breslow-Day tests), a test for a quantitative trait; a test for dif- ferences in missing genotype rate between cases and controls; multilocus tests, using either Hotelling’s T(2) statistic or a sum-statistic approach (evaluated by permutation) as well as haplotype tests. The basic tests can be performed with permutation, described in the following section to provide empirical p-values, and allow for dierent designs (e.g. by use of structured, within-cluster permutation).
我这了主要介绍一个Linear and logistic models
These two features allow for multiple covariates when testing for both quantitative trait and disease trait SNP association, and for interactions with those covariates. The covariates can either be continuous or binary (i.e. for categorical covariates, you must first make a set of binary dummy variables).
这个主要是可以加入协变量作为控制,很灵活,但是可能速度会慢一点
说明上说最基础的用法是这样:
但是这里我遇到一个问题
我bed bim fam 是不包含表型数据的,所以我要自己重新定义一个表型文件
我这里用quantitative traits作为例子
一般来说,就是自己设置一个文件
然后使用 --pheno 指定这个文件
--pheno causes phenotype values to be read from the 3rd column of the specified space- or tab-delimited file, instead of the .fam or .ped file. The first and second columns of that file must contain family and within-family IDs, respectively.
总共三列
前两列是family and within-family IDs 第三列是表型
我这里用第一主成分作为表型
咱们来试试看
失败了,程序运行错误
Warning: Skipping --linear since # variables >= # samples.
记住要加一个 --allow-no-sex
--allow-no-sex is now required if you want to retain phenotype values for missing-sex samples. This is a change from PLINK 1.07; we believe it would be more confusing to continue treating regular and --pheno phenotypes differently, and apologize for any temporary inconvenience we've caused.
程序就是这么设定的
plink --bfile clean --linear --pheno clean_one.eigenvec --allow-no-sex
成功啦
生成一个文件 assoc.linear
这个就可以用来画图啦
画图回到一
这里我解释一下每一列的意义吧
- 染色体
- snp 名字
- base-pair 物理位置
- Tested allele (minor allele by default)
- Code for the test 估计就是模型
- Number of non-missing individuals included in analysis 个体数目
- Regression coefficient (--linear) or odds ratio (--logistic) 也就是beta值(回归系数)
- Coefficient t-statistic (beta除以standard error, 越大越显著)
9 Asymptotic p-value for t-statistic P值 看显著
这个我就简单介绍到这里
这里我还遇到一些实战的问题
比如有些数据,我算出来极显著,P值等于0 ,这样后面画图 ylim不能为无穷大,会产生问题,还有就是我的图莫名其妙变瘦了,很奇怪。