鉴定基因组倍性判断方法:
1.smudgeplot
2.PloidyFrost
3.Survey软件:genomescope2(根据survey峰图判断)
4.与近缘二倍体做比较,基因组大小,基因组共线性,基因共线性,如果是植物,比如有胚植物,可结合busco D值观察,结合核型和HiC图。
1.smudgeplot
下载地址:https://github.com/KamilSJaron/smudgeplot
conda装,超级多依赖。
注意:目前已经到v0.4.0版本,与0.1.2.5低版本存在较大改变,kmer 计数软件改变,消耗内存较大,基因组较大时比较慢,可能大基因组会中断。
2.ploidyfrost
下载地址:https://github.com/CMB-BNU/PloidyFrost
ploidyfrost: Reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graphs
NGS数据分析:
mkdir kmc_tmp
export PATH=/share/nas1/pengzw/software/PloidyFrost/KMC/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bifrost/bin/:$PATH
thread=20
fq=fq.list
#/share/nas1/pengzw/project/01.data/Unknown_good_1.fq
#/share/nas1/pengzw/project/01.data/Unknown_good_2.fq
out_fq=out_fq.list
kmc -ci1 -cs10000 -k25 -t${thread} @${fq} kmc_db kmc_tmp
kmc_tools transform kmc_db histogram hist
lower_threshold=$(PloidyFrost cutoffL hist)
kmc_tools -t${thread} filter -hm kmc_db @${fq} -ci${lower_threshold} sample_filtered.fq
Bifrost build -i -d -k 25 -v -r sample_filtered.fq -o dbg -t ${thread}
PloidyFrost -g dbg.gfa -d kmc_db -t ${thread} -v -o multi -h hist
cd PloidyFrost_output
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Filter.R -i multi -o multi-filtered -n 6 -s 11 -q 0.05
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Drawfreq.R -f multi-filtered_allele_frequency.txt -t title -p 2 -o histogram
/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/PloidyFrost model -g multi-filtered_allele_frequency.txt -l 1 -u 10 -q 0.05 -o gmm
2.TGS数据分析
mkdir kmc_tmp
export PATH=/share/nas1/pengzw/software/PloidyFrost/KMC/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/:$PATH
export PATH=/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bifrost/bin/:$PATH
thread=20
fq=/share/nas1/pengzw/project/01.rawdata/ccs/.ccs.fasta.gz
kmc -ci1 -cs10000 -k25 -t${thread} -fa $fq kmc_db kmc_tmp
kmc_tools transform kmc_db histogram hist
lower_threshold=$(PloidyFrost cutoffL hist)
kmc_tools -t${thread} filter -hm kmc_db $fq -ci${lower_threshold} sample_filtered.fq
Bifrost build -i -d -k 25 -v -r sample_filtered.fq -o dbg -t ${thread}
PloidyFrost -g dbg.gfa -d kmc_db -t ${thread} -v -o single -h hist
cd PloidyFrost_output
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Filter.R -i single -o single-filtered -n 6 -s 11 -q 0.05
Rscript /share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/script/Drawfreq.R -f single-filtered_allele_frequency.txt -t title -p 2 -o histogram
/share/nas1/pengzw/software/PloidyFrost/main/PloidyFrost-main/bin/PloidyFrost model -g single-filtered_allele_frequency.txt -l 1 -u 10 -q 0.05 -o gmm
根据文章中结果进行判断:

3.Survey结果
多倍体会有多个峰,同源和异源的图不同,单倍体等。
4.与二倍体比较
http://www.genomesize.com/search.php 植物基因组流式 (也可查核型)
https://cvalues.science.kew.org/ 植物核型网站1
https://taux.evolseq.net/CCDB_web 植物核型网站2
http://ploidb.tau.ac.il/ 植物核型网站3
http://legacy.tropicos.org/Project/IPCN 植物核型网站4
与2倍体做基因组共线性,看是否分成多组。
核型已知的情况下,看HiC是否分成多组有线的热图,这种一般可能是高杂合组装多套。例如川芎的,淫羊藿的基因组文章
subphaser验证
分成多组之后,subphaser验证一下分组情况。
其他分析倍性的方法(没用过,记录一下):
HMMploidy:
https://peercommunityjournal.org/articles/10.24072/pcjournal.178/
ploidyNGS:
https://pubmed.ncbi.nlm.nih.gov/28383704/
nQuire:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2128-z