BUSCO使用笔记

Today I d like to reuse BUSCO to assess quality of genome assembly for a fungi. This tool can be run on a unix-like platform like ubuntu, like most of others used in bioinformatics. My computer is installed with bi-OS windows and ubuntu. On ubuntu there is no way to input Chinese characters, so this blog was finished in English based the earlier edition when I first use this tool.
Unfortunately, I encountered a hand of tough problems this time and cost nearly 6hrs, I can not imagine the happy situation in the last case. Whatever, the job was done, and I implemented this blog to save my life when the next time is going to come although I even know the actually date.

安裝BUSCO

下載 https://gitlab.com/ezlab/busco
path to the busco folder, relative path is forbidden
安裝sudo python setup.py install

怎麼都搞成繁體字了?

安裝下面3個軟件,後2個需要安裝,安裝方法見各自的readme文件, yes we have to install them instead just download

  • NCBI BLAST+ [NB: please see release note 2.0.1 below]
    https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST这个包中包含各种比对需要的工具
    It seems this tool is not to installed, I nearly forgot.

  • HMMER (HMMER v3.1b2)
    install
    from http://hmmer.org/
    tar xf hmmer-3.2.1.tar.gz
    cd hmmer-3.2.1
    ./configure
    make

  • Augustus (> 3.2.1) (only used for assessing genomes)
    http://bioinf.uni-greifswald.de/augustus/
    A lot of dynamic libraries were required, I do not know their usefulness, just install them, from the ubuntu origin or github.
    from ubuntu: sudo apt-get install xx, when you met errors, most of the time there would be some tips showing for you just follow the instruction.
    tar -xzf augustus-3.3.1.tar.gz
    Edit common.mk and uncomment the line ZIPINPUT = true
    sudo apt-get install libboost-iostreams-dev
    sudo apt-get install zlib1g-dev
    sudo apt-get install libgsl-dev
    sudo apt-get install libmysql++-dev
    sudo apt-get install libboost-graph-dev
    sudo apt-get install libsuitesparse-dev liblpsolve55-dev
    sudo apt-get install bamtools libbamtools-dev
    sudo apt-get install libboost-all-dev

from github (just use git clone "the website of the package on github") then cd to the path for your download item, and them sudo make install
sudo make install libbzip2

sudo apt-get install libbz2-dev
sudo apt-get install liblzma-dev
sudo apt-get install libncurses5-dev
export TOOLDIR=/home/dong/dlm_wd/asmb_assess/sof/TOOLDIR

when you install augustus, several tools are needed, you have to study the manual carefully, and follow those tips, step by step.

配置config.ini文件

BUSCO目錄下有config文件夾,將其中的config.ini.default拷貝並重命名爲config.ini,打開後修改,把所有的默認路徑如/home/osboxes/BUSCOVM/augustus/augustus-3.2.2/,都替換成實際的路徑,比如我的/home/larix/下载/augustus/

export AUGUSTUS_CONFIG_PATH="/path/to/AUGUSTUS/augustus-3.2.3/config/" 這裏的路徑替換成實際的,否則報錯

export AUGUSTUS_CONFIG_PATH="/home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/config/"
export BUSCO_CONFIG_FILE="/home/dong/dlm_wd/asmb_assess/sof/busco/busco-master/config/config.ini"

export PATH="/home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/bin:PATH" export PATH="/home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/scripts:PATH"
export AUGUSTUS_CONFIG_PATH="/home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/config/"

here is my config.int

[tblastn]
# path to tblastn
path = /home/dong/dlm_wd/asmb_assess/sof/blast/ncbi-blast-2.7.1+/bin/


[makeblastdb]
# path to makeblastdb
path = /home/dong/dlm_wd/asmb_assess/sof/blast/ncbi-blast-2.7.1+/bin/


[augustus]
# path to augustus
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/bin/

[etraining]
# path to augustus etraining
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/bin/

# path to augustus perl scripts, redeclare it for each new script
[gff2gbSmallDNA.pl]
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/scripts/
[new_species.pl]
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/scripts/
[optimize_augustus.pl]
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/scripts/

[hmmsearch]
# path to HMMsearch executable
path = /home/dong/dlm_wd/asmb_assess/sof/hmmer/hmmer-3.2.1/src/


[Rscript]
# path to Rscript, if you wish to use the plot tool
path = /usr/bin/

準備自己的數據和匹配數據庫

自己的數據指的是組裝好的fasta文件,數據庫要從http://busco.ezlab.org/ 下載對應的,放在電腦中的位置可以自定義,運行時給清楚就可以了。

run

基因組據裝評估

先把路徑cd到BUSCO文件夾,然後輸入
python scripts/run_BUSCO.py -i SEQUENCE_FILE -o OUTPUT_NAME -l LINEAGE -m geno
SEQUENCE_FILE-你組裝好的基因組文件
OUTPUT_NAME-給定一個輸出名
LINEAGE-要匹配的數據庫
geno-指定是基因組評估

我的python scripts/run_BUSCO.py -i /home/larix/下载/myproject/zglk_fungi.genome_contigs.fasta -o ev_genome -l ascomycota_odb9 -m geno
我的匹配數據庫放在了BUSCO文件夾中。

python scripts/run_BUSCO.py 
-i /home/dong/dlm_wd/asmb_assess/seq/genome/Metarhizium_anisopliae.Metarhizium_anisopliae.dna.nonchromosomal.fa 
-o  lvjiangjun_geno 
-l /home/dong/dlm_wd/asmb_assess/seq/reference/ascomycota_odb9 
-m geno

回車後程序會進入運行階段
給一個樣例

root@larix:/home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136
d# python scripts/run_BUSCO.py -i /home/larix/下载/myproject/sequence.fasta -o genome -l fungi_odb9 -m geno
INFO    ****************** Start a BUSCO 3.0.2 analysis, current time: 07/23/2017 14:44:33 ******************
INFO    Configuration loaded from /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/scripts/../config/config.ini
INFO    Init tools...
INFO    Check dependencies...
INFO    Check input file...
INFO    To reproduce this run: python scripts/run_BUSCO.py -i /home/larix/下载/myproject/sequence.fasta -o genome -l fungi_odb9/ -m genome -c 1 -sp aspergillus_nidulans
INFO    Mode is: genome
INFO    The lineage dataset is: fungi_odb9 (eukaryota)
INFO    Temp directory is ./tmp/
INFO    ****** Phase 1 of 2, initial predictions ******
INFO    ****** Step 1/3, current time: 07/23/2017 14:44:34 ******
INFO    Create blast database...
INFO    [makeblastdb]   Building a new DB, current time: 07/23/2017 14:44:35
INFO    [makeblastdb]   New DB name:   /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/tmp/genome_2557368148
INFO    [makeblastdb]   New DB title:  /home/larix/下载/myproject/sequence.fasta
INFO    [makeblastdb]   Sequence type: Nucleotide
INFO    [makeblastdb]   Keep Linkouts: T
INFO    [makeblastdb]   Keep MBits: T
INFO    [makeblastdb]   Maximum file size: 1000000000B
INFO    [makeblastdb]   Adding sequences from FASTA; added 1 sequences in 0.142315 seconds.
INFO    [makeblastdb]   1 of 1 task(s) completed at 07/23/2017 14:44:35
INFO    Running tblastn, writing output to /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/blast_output/tblastn_genome.tsv...
INFO    [tblastn]   1 of 1 task(s) completed at 07/23/2017 14:44:40
INFO    ****** Step 2/3, current time: 07/23/2017 14:44:40 ******
INFO    Maximum number of candidate contig per BUSCO limited to: 3
INFO    Getting coordinates for candidate regions...
INFO    Pre-Augustus scaffold extraction...
INFO    Running Augustus prediction using aspergillus_nidulans as species:
INFO    [augustus]  Please find all logs related to Augustus errors here: /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/augustus_output/augustus.log
INFO    [augustus]  7 of 69 task(s) completed at 07/23/2017 14:45:02
INFO    [augustus]  14 of 69 task(s) completed at 07/23/2017 14:45:27
INFO    [augustus]  21 of 69 task(s) completed at 07/23/2017 14:45:49
INFO    [augustus]  28 of 69 task(s) completed at 07/23/2017 14:46:04
INFO    [augustus]  35 of 69 task(s) completed at 07/23/2017 14:46:25
INFO    [augustus]  42 of 69 task(s) completed at 07/23/2017 14:46:42
INFO    [augustus]  49 of 69 task(s) completed at 07/23/2017 14:46:56
INFO    [augustus]  56 of 69 task(s) completed at 07/23/2017 14:47:50
INFO    [augustus]  63 of 69 task(s) completed at 07/23/2017 14:48:07
INFO    [augustus]  69 of 69 task(s) completed at 07/23/2017 14:48:18
INFO    Extracting predicted proteins...
INFO    ****** Step 3/3, current time: 07/23/2017 14:48:18 ******
INFO    Running HMMER to confirm orthology of predicted proteins:
INFO    [hmmsearch] 7 of 69 task(s) completed at 07/23/2017 14:48:19
INFO    [hmmsearch] 14 of 69 task(s) completed at 07/23/2017 14:48:19
INFO    [hmmsearch] 21 of 69 task(s) completed at 07/23/2017 14:48:19
INFO    [hmmsearch] 28 of 69 task(s) completed at 07/23/2017 14:48:20
INFO    [hmmsearch] 35 of 69 task(s) completed at 07/23/2017 14:48:20
INFO    [hmmsearch] 42 of 69 task(s) completed at 07/23/2017 14:48:20
INFO    [hmmsearch] 49 of 69 task(s) completed at 07/23/2017 14:48:20
INFO    [hmmsearch] 56 of 69 task(s) completed at 07/23/2017 14:48:20
INFO    [hmmsearch] 63 of 69 task(s) completed at 07/23/2017 14:48:21
INFO    [hmmsearch] 69 of 69 task(s) completed at 07/23/2017 14:48:21
INFO    Results:
INFO    C:6.6%[S:6.6%,D:0.0%],F:3.4%,M:90.0%,n:290
INFO    19 Complete BUSCOs (C)
INFO    19 Complete and single-copy BUSCOs (S)
INFO    0 Complete and duplicated BUSCOs (D)
INFO    10 Fragmented BUSCOs (F)
INFO    261 Missing BUSCOs (M)
INFO    290 Total BUSCO groups searched
INFO    ****** Phase 2 of 2, predictions using species specific training ******
INFO    ****** Step 1/3, current time: 07/23/2017 14:48:21 ******
INFO    Extracting missing and fragmented buscos from the ancestral_variants file...
INFO    Running tblastn, writing output to /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/blast_output/tblastn_genome_missing_and_frag_rerun.tsv...
INFO    [tblastn]   1 of 1 task(s) completed at 07/23/2017 14:49:00
INFO    Maximum number of candidate contig per BUSCO limited to: 3
INFO    Getting coordinates for candidate regions...
INFO    ****** Step 2/3, current time: 07/23/2017 14:49:00 ******
INFO    Training Augustus using Single-Copy Complete BUSCOs:
INFO    Converting predicted genes to short genbank files at 07/23/2017 14:49:00...
INFO    All files converted to short genbank files, now running the training scripts at 07/23/2017 14:49:01...
INFO    Pre-Augustus scaffold extraction...
INFO    Re-running Augustus with the new metaparameters, number of target BUSCOs: 271
INFO    [augustus]  7 of 61 task(s) completed at 07/23/2017 14:49:15
INFO    [augustus]  13 of 61 task(s) completed at 07/23/2017 14:49:29
INFO    [augustus]  19 of 61 task(s) completed at 07/23/2017 14:49:43
INFO    [augustus]  25 of 61 task(s) completed at 07/23/2017 14:50:02
INFO    [augustus]  31 of 61 task(s) completed at 07/23/2017 14:50:19
INFO    [augustus]  37 of 61 task(s) completed at 07/23/2017 14:51:08
INFO    [augustus]  43 of 61 task(s) completed at 07/23/2017 14:51:24
INFO    [augustus]  49 of 61 task(s) completed at 07/23/2017 14:51:46
INFO    [augustus]  55 of 61 task(s) completed at 07/23/2017 14:52:07
INFO    [augustus]  61 of 61 task(s) completed at 07/23/2017 14:52:26
INFO    Extracting predicted proteins...
INFO    ****** Step 3/3, current time: 07/23/2017 14:52:27 ******
INFO    Running HMMER to confirm orthology of predicted proteins:
INFO    [hmmsearch] 6 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 12 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 18 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 24 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 30 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 36 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 42 of 59 task(s) completed at 07/23/2017 14:52:28
INFO    [hmmsearch] 48 of 59 task(s) completed at 07/23/2017 14:52:28
INFO    [hmmsearch] 54 of 59 task(s) completed at 07/23/2017 14:52:28
INFO    [hmmsearch] 59 of 59 task(s) completed at 07/23/2017 14:52:28
INFO    Results:
INFO    C:6.6%[S:6.6%,D:0.0%],F:3.4%,M:90.0%,n:290
INFO    19 Complete BUSCOs (C)
INFO    19 Complete and single-copy BUSCOs (S)
INFO    0 Complete and duplicated BUSCOs (D)
INFO    10 Fragmented BUSCOs (F)
INFO    261 Missing BUSCOs (M)
INFO    290 Total BUSCO groups searched
INFO    BUSCO analysis done. Total running time: 474.648495913 seconds
INFO    Results written in /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/

for more readings, you can refer to BUSCO - 组装质量评估

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,530评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 86,403评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,120评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,770评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,758评论 5 367
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,649评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,021评论 3 398
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,675评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,931评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,659评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,751评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,410评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,004评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,969评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,042评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,493评论 2 343

推荐阅读更多精彩内容

  • 简介 这篇文章在Linux2.6.35上检测了MOSBENCH应用是否scale,也就是在单核和在48个核上面的性...
    Nuk阅读 1,246评论 0 2
  • 为何叫做 shell ? shell prompt(PS1) 与 Carriage Return(CR) 的关系?...
    Zero___阅读 3,138评论 3 49
  • 估计这是我半辈子的目标了吧!
    397f2d7b7d0b阅读 253评论 0 0
  • 和重要的人聚聚,真的不一样。 愿你惦念的人能和你道晚安,愿你独闯的日子里不觉得孤单。 祝大家中秋佳节快乐!
    南阳说阅读 181评论 0 1
  • 天呐,太让我羞愧了。很久没有称重了,今天买来称重,那数字太让人无脸见人啦! 如果没有真实数字展现在眼前,我还在处于...
    谭皓匀阅读 354评论 0 1