参考:宏基因组注释和可视化神器MEGAN入门_刘永鑫的博客——宏基因组公众号-CSDN博客
对PE数据的两个fq文件分别跑blastx:
/biostack/tools/alignment/diamond-2.0.4/diamond blastx -c 1 --db /biostack/database/nr/db/nr.dmnd -t tmp1 -p 24 -q all_other.R1.fq.gz --daa diamond-C1.1.daa >diamond.log1 &
/biostack/tools/alignment/diamond-2.0.4/diamond blastx -c 1 --db /biostack/database/nr/db/nr.dmnd -t tmp2 -p 24 -q all_other.R2.fq.gz --daa diamond-C1.2.daa >diamond.log2 &
注意:tmp1和tmp2目录要先建立。这一步会跑很久。
对daa文件进行转化为MEGAN需要的rma文件:
/biostack/tools/microbiome/MEGAN_Community-6.19.2/tools/daa2rma -i diamond-C1.1.daa diamond-C1.2.daa --paired -ms 50 -me 0.01 -top 50 -mdb /biostack/database/megan/megan-map-May2020.db -o diamond-C1.rma
输出为以下:
Version MEGAN Community Edition (version 6.19.2, built 17 Jun 2020)
Author(s) Daniel H. Huson
Copyright (C) 2020 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Functional classifications to use: EGGNOG, GTDB, INTERPRO2GO, SEED
Loading ncbi.map: 2,249,459
Loading ncbi.tre: 2,249,463
Loading eggnog.map: 30,875
Loading eggnog.tre: 30,986
Loading gtdb.map: 182,187
Loading gtdb.tre: 182,191
Loading interpro2go.map: 12,738
Loading interpro2go.tre: 28,689
Loading seed.map: 978
Loading seed.tre: 979
In DAA files: diamond-C1.1.daa, diamond-C1.2.daa
Output file: diamond-C1.rma
Classifications: Taxonomy, SEED, EGGNOG, GTDB, INTERPRO2GO
Generating RMA6 file Parsing matches
Annotating RMA6 file using FAST mode (accession database and first accession per line)
Parsing file diamond-C1.1.daa
Parsing file: diamond-C1.1.daa
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (2397.7s)
Parsing file diamond-C1.2.daa
Parsing file: diamond-C1.2.daa
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (2336.6s)
Total reads: 18,425,737
Alignments: 389,875,476
100% (0.0s)
100% (0.0s)
Linking paired reads
Number of pairs: 0
Binning reads: Initializing...
Initializing binning...
Using paired reads in taxonomic assignment...
Using 'Naive LCA' algorithm for binning: Taxonomy
Using Best-Hit algorithm for binning: SEED
Using Best-Hit algorithm for binning: EGGNOG
Using 'Naive LCA' algorithm for binning: GTDB
Using Best-Hit algorithm for binning: INTERPRO2GO
Binning reads...
Binning reads: Analyzing alignments
Total reads: 18,425,737
With hits: 18,425,737
Alignments: 389,875,476
Assig. Taxonomy: 18,366,746
Assig. SEED: 11,106,599
Assig. EGGNOG: 11,487,509
Assig. GTDB: 17,662,177
Assig. INTERPRO2GO: 10,358,118
MinSupport set to: 9212
Binning reads: Applying min-support & disabled filter to Taxonomy...
Min-supp. changes: 8,122
Binning reads: Applying min-support & disabled filter to GTDB...
Min-supp. changes: 20,690
Binning reads: Writing classification tables
Numb. Tax. classes: 133
Numb. SEED classes: 750
Numb. EGG. classes: 7,456
Numb. GTDB classes: 104
Numb. INT. classes: 8,954
Binning reads: Syncing
Class. Taxonomy: 133
Class. SEED: 750
Class. EGGNOG: 7,456
Class. GTDB: 104
Class. INTERPRO2GO: 8,954
100% (19792.7s)
Total time: 24,536s
Peak memory: 144.1 of 195.3 G
耗时差不多7个小时。
提取物种注释数据:
/biostack/tools/microbiome/MEGAN_Community-6.19.2/tools/rma2info -i diamond-C1.rma -c2c Taxonomy -r2c Taxonomy -n true --paths true --ranks true --list true --listMore true --bacteriaOnly true -v > C1Taxonomy1.txt
屏幕输出:
RMA2Info - Analyses an RMA file
Options:
Input and Output
--in: diamond-C1.rma
--out: -
Commands
--list: true
--listMore: true
--class2count: Taxonomy
--read2class: Taxonomy
--names: true
--paths: true
--ranks: true
--majorRanksOnly: false
--bacteriaOnly: true
--virusOnly: false
--ignoreUnassigned: true
Other:
--verbose: true
Version MEGAN Community Edition (version 6.19.2, built 17 Jun 2020)
Author(s) Daniel H. Huson
Copyright (C) 2020 Daniel H. Huson. This program comes with ABSOLUTELY NO WARRANTY.
Loading MEGAN File: diamond-C1.rma
Loading ncbi.map: 2,249,459
Loading ncbi.tre: 2,249,463
Total time: 492s
Peak memory: 29.8 of 195.3 G
提取EGGNOG注释:
/biostack/tools/microbiome/MEGAN_Community-6.19.2/tools/rma2info -i diamond-C1.rma -r2c EGGNOG -n true --paths true --ranks true --list true --listMore true -v > C1eggnog.txt
提取SEED注释:
/biostack/tools/microbiome/MEGAN_Community-6.19.2/tools/rma2info -i diamond-C1.rma -r2c SEED -n true --paths true --ranks true --list true --listMore true -v > C1SEED.txt
提取INTERPRO2GO注释:
/biostack/tools/microbiome/MEGAN_Community-6.19.2/tools/rma2info -i diamond-C1.rma -r2c INTERPRO2GO -n true --paths true --ranks true --list true --listMore true -v > C1INTERPRO2GO.txt
下载MEGAN6 Community Edition installers:https://software-ab.informatik.uni-tuebingen.de/download/megan6/welcome.html