本文译自:https://davidemms.github.io/orthofinder_tutorials/running-an-example-orthofinder-analysis.html
1.安装OrthoFinder
conda安装即可
conda install orthofinder
2.运行OrthoFinder
将下载的pep序列放入一个文件夹内
通常从数据库中下载的蛋白质组文件包含每个基因的多个转录本,如果用原始文件运行OrthoFinder,不仅会多花费十倍的时间,还可能会降低准确性,所以要用OrthoFinder自带的脚本提取每个基因的最长转录本
#conda安装的脚本在bin目录内
for f in *fa ; do python ~/miniconda/bin/primary_transcript.py $f ; done
提取最长转录本后的pep文件都在primary_transcripts目录内
orthofinder -f primary_transcripts
运行时长在20分钟~几个小时,如果运行正常的话,OrthoFinder 输出的最后几行将如下所示:
Results:
/primary_transcripts/OrthoFinder/Results_Nov26/
OrthoFinder assigned 121743 genes (92.9% of total) to 17981 orthogroups. Fifty percent of all genes were in orthogroups with 7 or more genes (G50 was 7) and were contained in the largest 5076 orthogroups (O50 was 5076). There were 5485 orthogroups with all species present and 1755 of these consisted entirely of single-copy genes.
CITATION:
When publishing work that uses OrthoFinder please cite:
Emms D.M. & Kelly S. (2019), Genome Biology 20:238
If you use the species tree in your work then please also cite:
Emms D.M. & Kelly S. (2017), MBE 34(12): 3267-3278
Emms D.M. & Kelly S. (2018), bioRxiv https://doi.org/10.1101/267914
3.结果分析
可以参考Orthofinder运行结果文件解读 - 简书 (jianshu.com)