Today I d like to reuse
BUSCO
to assess quality of genome assembly for a fungi. This tool can be run on a unix-like platform like ubuntu, like most of others used in bioinformatics. My computer is installed with bi-OS windows and ubuntu. On ubuntu there is no way to input Chinese characters, so this blog was finished in English based the earlier edition when I first use this tool.
Unfortunately, I encountered a hand of tough problems this time and cost nearly 6hrs, I can not imagine the happy situation in the last case. Whatever, the job was done, and I implemented this blog to save my life when the next time is going to come although I even know the actually date.
安裝BUSCO
下載 https://gitlab.com/ezlab/busco
path to the busco folder, relative path is forbidden
安裝sudo python setup.py install
怎麼都搞成繁體字了?
安裝下面3個軟件,後2個需要安裝,安裝方法見各自的readme文件, yes we have to install them instead just download
NCBI BLAST+ [NB: please see release note 2.0.1 below]
https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST这个包中包含各种比对需要的工具
It seems this tool is not to installed, I nearly forgot.HMMER (HMMER v3.1b2)
install
from http://hmmer.org/
tar xf hmmer-3.2.1.tar.gz
cd hmmer-3.2.1
./configure
makeAugustus (> 3.2.1) (only used for assessing genomes)
http://bioinf.uni-greifswald.de/augustus/
A lot of dynamic libraries were required, I do not know their usefulness, just install them, from the ubuntu origin or github.
from ubuntu: sudo apt-get install xx, when you met errors, most of the time there would be some tips showing for you just follow the instruction.
tar -xzf augustus-3.3.1.tar.gz
Edit common.mk and uncomment the line ZIPINPUT = true
sudo apt-get install libboost-iostreams-dev
sudo apt-get install zlib1g-dev
sudo apt-get install libgsl-dev
sudo apt-get install libmysql++-dev
sudo apt-get install libboost-graph-dev
sudo apt-get install libsuitesparse-dev liblpsolve55-dev
sudo apt-get install bamtools libbamtools-dev
sudo apt-get install libboost-all-dev
from github (just use git clone "the website of the package on github") then cd to the path for your download item, and them sudo make install
sudo make install libbzip2
sudo apt-get install libbz2-dev
sudo apt-get install liblzma-dev
sudo apt-get install libncurses5-dev
export TOOLDIR=/home/dong/dlm_wd/asmb_assess/sof/TOOLDIR
when you install augustus, several tools are needed, you have to study the manual carefully, and follow those tips, step by step.
配置config.ini文件
BUSCO目錄下有config文件夾,將其中的config.ini.default拷貝並重命名爲config.ini,打開後修改,把所有的默認路徑如/home/osboxes/BUSCOVM/augustus/augustus-3.2.2/
,都替換成實際的路徑,比如我的/home/larix/下载/augustus/
export AUGUSTUS_CONFIG_PATH="/path/to/AUGUSTUS/augustus-3.2.3/config/"
這裏的路徑替換成實際的,否則報錯
export AUGUSTUS_CONFIG_PATH="/home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/config/"
export BUSCO_CONFIG_FILE="/home/dong/dlm_wd/asmb_assess/sof/busco/busco-master/config/config.ini"
export PATH="/home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/bin:PATH"
export AUGUSTUS_CONFIG_PATH="/home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/config/"
here is my config.int
[tblastn]
# path to tblastn
path = /home/dong/dlm_wd/asmb_assess/sof/blast/ncbi-blast-2.7.1+/bin/
[makeblastdb]
# path to makeblastdb
path = /home/dong/dlm_wd/asmb_assess/sof/blast/ncbi-blast-2.7.1+/bin/
[augustus]
# path to augustus
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/bin/
[etraining]
# path to augustus etraining
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/bin/
# path to augustus perl scripts, redeclare it for each new script
[gff2gbSmallDNA.pl]
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/scripts/
[new_species.pl]
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/scripts/
[optimize_augustus.pl]
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/scripts/
[hmmsearch]
# path to HMMsearch executable
path = /home/dong/dlm_wd/asmb_assess/sof/hmmer/hmmer-3.2.1/src/
[Rscript]
# path to Rscript, if you wish to use the plot tool
path = /usr/bin/
準備自己的數據和匹配數據庫
自己的數據指的是組裝好的fasta文件,數據庫要從http://busco.ezlab.org/ 下載對應的,放在電腦中的位置可以自定義,運行時給清楚就可以了。
run
基因組據裝評估
先把路徑cd到BUSCO文件夾,然後輸入
python scripts/run_BUSCO.py -i SEQUENCE_FILE -o OUTPUT_NAME -l LINEAGE -m geno
SEQUENCE_FILE-你組裝好的基因組文件
OUTPUT_NAME-給定一個輸出名
LINEAGE-要匹配的數據庫
geno-指定是基因組評估
我的python scripts/run_BUSCO.py -i /home/larix/下载/myproject/zglk_fungi.genome_contigs.fasta -o ev_genome -l ascomycota_odb9 -m geno
我的匹配數據庫放在了BUSCO文件夾中。
python scripts/run_BUSCO.py
-i /home/dong/dlm_wd/asmb_assess/seq/genome/Metarhizium_anisopliae.Metarhizium_anisopliae.dna.nonchromosomal.fa
-o lvjiangjun_geno
-l /home/dong/dlm_wd/asmb_assess/seq/reference/ascomycota_odb9
-m geno
回車後程序會進入運行階段
給一個樣例
root@larix:/home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136
d# python scripts/run_BUSCO.py -i /home/larix/下载/myproject/sequence.fasta -o genome -l fungi_odb9 -m geno
INFO ****************** Start a BUSCO 3.0.2 analysis, current time: 07/23/2017 14:44:33 ******************
INFO Configuration loaded from /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/scripts/../config/config.ini
INFO Init tools...
INFO Check dependencies...
INFO Check input file...
INFO To reproduce this run: python scripts/run_BUSCO.py -i /home/larix/下载/myproject/sequence.fasta -o genome -l fungi_odb9/ -m genome -c 1 -sp aspergillus_nidulans
INFO Mode is: genome
INFO The lineage dataset is: fungi_odb9 (eukaryota)
INFO Temp directory is ./tmp/
INFO ****** Phase 1 of 2, initial predictions ******
INFO ****** Step 1/3, current time: 07/23/2017 14:44:34 ******
INFO Create blast database...
INFO [makeblastdb] Building a new DB, current time: 07/23/2017 14:44:35
INFO [makeblastdb] New DB name: /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/tmp/genome_2557368148
INFO [makeblastdb] New DB title: /home/larix/下载/myproject/sequence.fasta
INFO [makeblastdb] Sequence type: Nucleotide
INFO [makeblastdb] Keep Linkouts: T
INFO [makeblastdb] Keep MBits: T
INFO [makeblastdb] Maximum file size: 1000000000B
INFO [makeblastdb] Adding sequences from FASTA; added 1 sequences in 0.142315 seconds.
INFO [makeblastdb] 1 of 1 task(s) completed at 07/23/2017 14:44:35
INFO Running tblastn, writing output to /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/blast_output/tblastn_genome.tsv...
INFO [tblastn] 1 of 1 task(s) completed at 07/23/2017 14:44:40
INFO ****** Step 2/3, current time: 07/23/2017 14:44:40 ******
INFO Maximum number of candidate contig per BUSCO limited to: 3
INFO Getting coordinates for candidate regions...
INFO Pre-Augustus scaffold extraction...
INFO Running Augustus prediction using aspergillus_nidulans as species:
INFO [augustus] Please find all logs related to Augustus errors here: /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/augustus_output/augustus.log
INFO [augustus] 7 of 69 task(s) completed at 07/23/2017 14:45:02
INFO [augustus] 14 of 69 task(s) completed at 07/23/2017 14:45:27
INFO [augustus] 21 of 69 task(s) completed at 07/23/2017 14:45:49
INFO [augustus] 28 of 69 task(s) completed at 07/23/2017 14:46:04
INFO [augustus] 35 of 69 task(s) completed at 07/23/2017 14:46:25
INFO [augustus] 42 of 69 task(s) completed at 07/23/2017 14:46:42
INFO [augustus] 49 of 69 task(s) completed at 07/23/2017 14:46:56
INFO [augustus] 56 of 69 task(s) completed at 07/23/2017 14:47:50
INFO [augustus] 63 of 69 task(s) completed at 07/23/2017 14:48:07
INFO [augustus] 69 of 69 task(s) completed at 07/23/2017 14:48:18
INFO Extracting predicted proteins...
INFO ****** Step 3/3, current time: 07/23/2017 14:48:18 ******
INFO Running HMMER to confirm orthology of predicted proteins:
INFO [hmmsearch] 7 of 69 task(s) completed at 07/23/2017 14:48:19
INFO [hmmsearch] 14 of 69 task(s) completed at 07/23/2017 14:48:19
INFO [hmmsearch] 21 of 69 task(s) completed at 07/23/2017 14:48:19
INFO [hmmsearch] 28 of 69 task(s) completed at 07/23/2017 14:48:20
INFO [hmmsearch] 35 of 69 task(s) completed at 07/23/2017 14:48:20
INFO [hmmsearch] 42 of 69 task(s) completed at 07/23/2017 14:48:20
INFO [hmmsearch] 49 of 69 task(s) completed at 07/23/2017 14:48:20
INFO [hmmsearch] 56 of 69 task(s) completed at 07/23/2017 14:48:20
INFO [hmmsearch] 63 of 69 task(s) completed at 07/23/2017 14:48:21
INFO [hmmsearch] 69 of 69 task(s) completed at 07/23/2017 14:48:21
INFO Results:
INFO C:6.6%[S:6.6%,D:0.0%],F:3.4%,M:90.0%,n:290
INFO 19 Complete BUSCOs (C)
INFO 19 Complete and single-copy BUSCOs (S)
INFO 0 Complete and duplicated BUSCOs (D)
INFO 10 Fragmented BUSCOs (F)
INFO 261 Missing BUSCOs (M)
INFO 290 Total BUSCO groups searched
INFO ****** Phase 2 of 2, predictions using species specific training ******
INFO ****** Step 1/3, current time: 07/23/2017 14:48:21 ******
INFO Extracting missing and fragmented buscos from the ancestral_variants file...
INFO Running tblastn, writing output to /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/blast_output/tblastn_genome_missing_and_frag_rerun.tsv...
INFO [tblastn] 1 of 1 task(s) completed at 07/23/2017 14:49:00
INFO Maximum number of candidate contig per BUSCO limited to: 3
INFO Getting coordinates for candidate regions...
INFO ****** Step 2/3, current time: 07/23/2017 14:49:00 ******
INFO Training Augustus using Single-Copy Complete BUSCOs:
INFO Converting predicted genes to short genbank files at 07/23/2017 14:49:00...
INFO All files converted to short genbank files, now running the training scripts at 07/23/2017 14:49:01...
INFO Pre-Augustus scaffold extraction...
INFO Re-running Augustus with the new metaparameters, number of target BUSCOs: 271
INFO [augustus] 7 of 61 task(s) completed at 07/23/2017 14:49:15
INFO [augustus] 13 of 61 task(s) completed at 07/23/2017 14:49:29
INFO [augustus] 19 of 61 task(s) completed at 07/23/2017 14:49:43
INFO [augustus] 25 of 61 task(s) completed at 07/23/2017 14:50:02
INFO [augustus] 31 of 61 task(s) completed at 07/23/2017 14:50:19
INFO [augustus] 37 of 61 task(s) completed at 07/23/2017 14:51:08
INFO [augustus] 43 of 61 task(s) completed at 07/23/2017 14:51:24
INFO [augustus] 49 of 61 task(s) completed at 07/23/2017 14:51:46
INFO [augustus] 55 of 61 task(s) completed at 07/23/2017 14:52:07
INFO [augustus] 61 of 61 task(s) completed at 07/23/2017 14:52:26
INFO Extracting predicted proteins...
INFO ****** Step 3/3, current time: 07/23/2017 14:52:27 ******
INFO Running HMMER to confirm orthology of predicted proteins:
INFO [hmmsearch] 6 of 59 task(s) completed at 07/23/2017 14:52:27
INFO [hmmsearch] 12 of 59 task(s) completed at 07/23/2017 14:52:27
INFO [hmmsearch] 18 of 59 task(s) completed at 07/23/2017 14:52:27
INFO [hmmsearch] 24 of 59 task(s) completed at 07/23/2017 14:52:27
INFO [hmmsearch] 30 of 59 task(s) completed at 07/23/2017 14:52:27
INFO [hmmsearch] 36 of 59 task(s) completed at 07/23/2017 14:52:27
INFO [hmmsearch] 42 of 59 task(s) completed at 07/23/2017 14:52:28
INFO [hmmsearch] 48 of 59 task(s) completed at 07/23/2017 14:52:28
INFO [hmmsearch] 54 of 59 task(s) completed at 07/23/2017 14:52:28
INFO [hmmsearch] 59 of 59 task(s) completed at 07/23/2017 14:52:28
INFO Results:
INFO C:6.6%[S:6.6%,D:0.0%],F:3.4%,M:90.0%,n:290
INFO 19 Complete BUSCOs (C)
INFO 19 Complete and single-copy BUSCOs (S)
INFO 0 Complete and duplicated BUSCOs (D)
INFO 10 Fragmented BUSCOs (F)
INFO 261 Missing BUSCOs (M)
INFO 290 Total BUSCO groups searched
INFO BUSCO analysis done. Total running time: 474.648495913 seconds
INFO Results written in /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/
for more readings, you can refer to BUSCO - 组装质量评估