这次准备介绍的是一个使用简便的Python分析包:ipyrad。(在此特别感谢虞米儿~)
ipyrad 官方文档: https://ipyrad.readthedocs.io/en/latest/API-analysis/cookbook-abba-baba.html
对我来说比较有用的是下面这几个模块:
vcf_to_hdf5
treemix
PCA
RAxML
STRUCTURE
abba-baba
先来介绍一下PCA模块~
下载
作者推荐使用conda下载,最好下载比较新的版本(python v 3.7)
conda install ipyrad=0.9.63 -c bioconda
到这主体部分就下好了。
PCA分析
下载额外用到的模块
conda install scikit-learn -c bioconda
conda install toyplot -c eaton-lab
将VCF文件转换为haf5格式
vcf2hdf5.py:
### name : output_file_name data : input_vcf ld_block_size : window_size
import ipyrad.analysis as ipa
import pandas as pd
converter = ipa.vcf_to_hdf5(
name="test_LD10k",
data="all_snp_1.vcf.gz",
ld_block_size=10000,
)
converter.run()
运行结果在 analysis-vcf2hdf5 文件夹下
运行PCA分析
ipyrad_pca.py :
import ipyrad.analysis as ipa
import pandas as pd
import toyplot
import toyplot.pdf
data = "analysis-vcf2hdf5/test_LD10k.snps.hdf5"
imap = {
"AS": ["AS1","AS2","AS3","AS4","AS5","AS6","AS7","AS8","AS9","AS10","AS11","AS12","AS13","AS14","AS15","AS16","AS17","AS18","AS19"],
"ES": ["ES1","ES2","ES3","ES4","ES5","ES6","ES7","ES8","ES9","ES10","ES11","ES12","ES13","ES14","ES15","ES16","ES17","ES18","ES19","ES20"],
}
minmap = {i: 0.5 for i in imap}
pca = ipa.pca(data = data,
imap = imap,
minmap = minmap,
mincov = 0.75,
impute_method = "sample",
)
pca.run(nreplicates=25,seed=12345)
df = pd.DataFrame(pca.pcaxes[0], index=pca.names)
df.to_csv("pca_analysis.csv")
canvas, axes = pca.draw(0,2)
toyplot.pdf.render(canvas, "PCA-out.pdf")
这里产生两个结果
pca_analysis.csv PCA分析结果文件:
PCA-out.pdf 结果图:
这里运用的是pca.run(nreplicates=25,seed=12345) 方法,可以根据说明书自行修改。
tSNE分析
tSNE是降维分析的一种,具体原理还不是很懂,这个方法也提供了tSNE方法。
ipyrad_tsne.py :
import ipyrad.analysis as ipa
import pandas as pd
import toyplot
import toyplot.pdf
data = "analysis-vcf2hdf5/test_LD10k.snps.hdf5"
imap = {
"AS": ["AS1","AS2","AS3","AS4","AS5","AS6","AS7","AS8","AS9","AS10","AS11","AS12","AS13","AS14","AS15","AS16","AS17","AS18","AS19"],
"ES": ["ES1","ES2","ES3","ES4","ES5","ES6","ES7","ES8","ES9","ES10","ES11","ES12","ES13","ES14","ES15","ES16","ES17","ES18","ES19","ES20"],
}
minmap = {i: 0.5 for i in imap}
pca = ipa.pca(data = data,
imap = imap,
minmap = minmap,
mincov = 0.75,
impute_method = "sample",
)
pca.run_tsne(subsample=True, perplexity=4.0, n_iter=100000, seed=123)
canvas, axes = pca.draw()
toyplot.pdf.render(canvas, "tsne-out.pdf")
tsne-out.pdf :
PCA这里就写这么多,只是编写了我的做法,更详细的内容还是去看官方文档吧~
ipyrad后续内容,且听下回。