「基因组」判断物种倍性

鉴定基因组倍性判断方法:

1.smudgeplot
2.PloidyFrost
3.Survey软件:genomescope2(根据survey峰图判断)
4.与近缘二倍体做比较,基因组大小,基因组共线性,基因共线性,如果是植物,比如有胚植物,可结合busco D值观察,结合核型和HiC图。

1.smudgeplot

下载地址:https://github.com/KamilSJaron/smudgeplot
conda装,超级多依赖。
注意:目前已经到v0.4.0版本,与0.1.2.5低版本存在较大改变,kmer 计数软件改变,消耗内存较大,基因组较大时比较慢,可能大基因组会中断。

2.ploidyfrost

下载地址:https://github.com/CMB-BNU/PloidyFrost
ploidyfrost: Reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graphs

NGS数据分析:

mkdir kmc_tmp 
export PATH=/share/nas1/pengzw/software/PloidyFrost/KMC/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bifrost/bin/:$PATH

thread=20
fq=fq.list
#/share/nas1/pengzw/project/01.data/Unknown_good_1.fq
#/share/nas1/pengzw/project/01.data/Unknown_good_2.fq
out_fq=out_fq.list
kmc -ci1 -cs10000 -k25 -t${thread} @${fq} kmc_db kmc_tmp
kmc_tools transform kmc_db histogram hist
lower_threshold=$(PloidyFrost cutoffL hist)
kmc_tools -t${thread} filter -hm kmc_db @${fq} -ci${lower_threshold} sample_filtered.fq


Bifrost build -i -d -k 25 -v -r sample_filtered.fq -o dbg -t ${thread}

PloidyFrost -g dbg.gfa -d kmc_db -t ${thread} -v -o multi -h hist
cd PloidyFrost_output
Rscript  /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Filter.R -i multi -o multi-filtered -n 6 -s 11 -q 0.05
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Drawfreq.R -f multi-filtered_allele_frequency.txt -t title -p 2 -o histogram
/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/PloidyFrost model -g multi-filtered_allele_frequency.txt -l 1 -u 10 -q 0.05 -o gmm

2.TGS数据分析

mkdir kmc_tmp 
export PATH=/share/nas1/pengzw/software/PloidyFrost/KMC/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bifrost/bin/:$PATH

thread=20
fq=/share/nas1/pengzw/project/01.rawdata/ccs/.ccs.fasta.gz
kmc -ci1 -cs10000 -k25 -t${thread}  -fa $fq  kmc_db kmc_tmp
kmc_tools transform kmc_db histogram hist
lower_threshold=$(PloidyFrost cutoffL hist)
kmc_tools -t${thread} filter -hm kmc_db $fq -ci${lower_threshold} sample_filtered.fq


Bifrost build -i -d -k 25 -v -r sample_filtered.fq -o dbg -t ${thread}

PloidyFrost -g dbg.gfa -d kmc_db -t ${thread} -v -o single -h hist
cd PloidyFrost_output
Rscript  /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Filter.R -i single -o single-filtered -n 6 -s 11 -q 0.05
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Drawfreq.R -f single-filtered_allele_frequency.txt -t title -p 2 -o histogram
/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/PloidyFrost model -g single-filtered_allele_frequency.txt -l 1 -u 10 -q 0.05 -o gmm

根据文章中结果进行判断:


image.png

3.Survey结果

多倍体会有多个峰,同源和异源的图不同,单倍体等。

4.与二倍体比较

http://www.genomesize.com/search.php 植物基因组流式 (也可查核型)
https://cvalues.science.kew.org/ 植物核型网站1
https://taux.evolseq.net/CCDB_web 植物核型网站2
http://ploidb.tau.ac.il/ 植物核型网站3
http://legacy.tropicos.org/Project/IPCN 植物核型网站4

与2倍体做基因组共线性,看是否分成多组。
核型已知的情况下,看HiC是否分成多组有线的热图,这种一般可能是高杂合组装多套。例如川芎的,淫羊藿的基因组文章

subphaser验证

分成多组之后,subphaser验证一下分组情况。

其他分析倍性的方法(没用过,记录一下):
HMMploidy:
https://peercommunityjournal.org/articles/10.24072/pcjournal.178/

ploidyNGS:
https://pubmed.ncbi.nlm.nih.gov/28383704/

nQuire:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2128-z

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容