MetaPhlAn3宏基因组物种注释

相比version2,version3数据库拓了很多,还有3支持自建数据库了。

Github: https://github.com/biobakery/MetaPhlAn
Manual: https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.0
Wiki: https://github.com/biobakery/biobakery/wiki/metaphlan3

MetaPhlAn依赖~1.1M的clade-specific的marker genes,
mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2
这些marker genes由~100,000 reference genomes (~99,500 bacterial and archaeal and ~500 eukaryotic)确定。该数据可用于物种分类,株水平分析,定量,快速计算。

安装 - manual 给的方法

conda create --name mpa -c bioconda python=3.7 metaphlan
conda activate mpa

不出意外的话,会报错

undefined symbol: _ZN3tbb10interface78internal15task_arena_base19internal_initializeEv

bowtie2-build-s: symbol lookup error, undefined symbol

安装 - 针对tbb版本问题

conda create -n metaphlan python=3.7
conda activate metaphlan
conda install tbb=2020.2
conda install bowtie2
conda install -c bioconda metaphlan
metaphlan --version
# MetaPhlAn version 3.0.13 (27 Jul 2021)
bowtie2 --version
# bowtie2-align-s version 2.4.4
# Compiler: gcc version 9.3.0

这里顺便安装了phylophlan3

数据库获取

下载地址:
Google Driver
Zenodo

Google Driver 文件列表:

bcftools    https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AACPUjNhnsIIJn16-ww6-MWOa/bcftools?dl=1
metaphlan2_homebrew_counter.txt https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAD6ImeA91we2nBWBhpfaEOqa/metaphlan2_homebrew_counter.txt?dl=1
mpa_latest  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAAyoJpOgcjop41VIHAGWIVLa/mpa_latest?dl=1
mpa_v20_m200_marker_info.txt.bz2    https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AABKU_RAK5yOzyhV27NpOduDa/mpa_v20_m200_marker_info.txt.bz2?dl=1
mpa_v20_m200.md5    https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AADS8nukx3dSoiR82OHw6dOka/mpa_v20_m200.md5?dl=1
mpa_v20_m200.tar    https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAASBOj-2gAbA53cV1bXBULYa/mpa_v20_m200.tar?dl=1
mpa_v29_CHOCOPhlAn_201901_marker_info.txt.bz2   https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AADhE2Ur7JqirifOdgi4fGQEa/mpa_v29_CHOCOPhlAn_201901_marker_info.txt.bz2?dl=1
mpa_v29_CHOCOPhlAn_201901.md5   https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AADdxAWsjLPLjy10VICgSAEPa/mpa_v29_CHOCOPhlAn_201901.md5?dl=1
mpa_v29_CHOCOPhlAn_201901.tar   https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAACeczBU6P9lIBD4ZYtxwKva/mpa_v29_CHOCOPhlAn_201901.tar?dl=1
mpa_v292_CHOCOPhlAn_201901_marker_info.txt.bz2  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AADoxBNynVopWt2shYYKQ2Mba/mpa_v292_CHOCOPhlAn_201901_marker_info.txt.bz2?dl=1
mpa_v292_CHOCOPhlAn_201901.md5  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAApZLRLD0Bkb86bvH-Y3-tUa/mpa_v292_CHOCOPhlAn_201901.md5?dl=1
mpa_v292_CHOCOPhlAn_201901.tar  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AABK4Zns2PYUh_R-sLGEx_Bza/mpa_v292_CHOCOPhlAn_201901.tar?dl=1
mpa_v293_CHOCOPhlAn_201901.md5  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAAsg2PN5Ng6uwHnEemlqo3-a/mpa_v293_CHOCOPhlAn_201901.md5?dl=1
mpa_v293_CHOCOPhlAn_201901.tar  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AABFg8C3gMyNpYNTAY5PcSONa/mpa_v293_CHOCOPhlAn_201901.tar?dl=1
mpa_v294_CHOCOPhlAn_201901.md5  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAA6KBc-nd8_C4bJOPVjzWW7a/mpa_v294_CHOCOPhlAn_201901.md5?dl=1
mpa_v294_CHOCOPhlAn_201901.tar  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAAJSiG2qkfNMcYzhvz4Rxsga/mpa_v294_CHOCOPhlAn_201901.tar?dl=1
mpa_v295_CHOCOPhlAn_201901.md5  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AACIWLuO_0ixZsxiSZZG-4M1a/mpa_v295_CHOCOPhlAn_201901.md5?dl=1
mpa_v295_CHOCOPhlAn_201901.tar  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAA0y5qvPjJdxVw86Ovjinipa/mpa_v295_CHOCOPhlAn_201901.tar?dl=1
mpa_v296_CHOCOPhlAn_201901.md5  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAAbRMEnsewv_pHFsLv2Be0va/mpa_v296_CHOCOPhlAn_201901.md5?dl=1
mpa_v296_CHOCOPhlAn_201901.tar  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AABAD51gd3wr0___2MWO0xD-a/mpa_v296_CHOCOPhlAn_201901.tar?dl=1
mpa_v296_CHOCOPhlAn_201901_marker_info.txt.bz2  https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAAv_ShZiz7pNTaT_YONJTF7a/mpa_v296_CHOCOPhlAn_201901_marker_info.txt.bz2?dl=1
mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2   https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAAlyQITZuUCtBUJxpxhIroIa/mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2?dl=1
mpa_v30_CHOCOPhlAn_201901.tar   https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AADlxibskzbPHPoDl6S-FyKka/mpa_v30_CHOCOPhlAn_201901.tar?dl=1
mpa_v30_CHOCOPhlAn_201901.md5   https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AACTzoUYDqZps8u2JqWCNCODa/mpa_v30_CHOCOPhlAn_201901.md5?dl=1
SRS019033.fastq https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AACh1NQExDk39RXzZOyTPmQwa/SRS019033.fastq?dl=1
strainphlan_homebrew_counter.txt    https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAAZO5uydQ2QQkOSkYu5DxRha/strainphlan_homebrew_counter.txt?dl=1

Win下载:marker genes数据

# 解压 15M -> 336M
bzip2 -d mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2
less -S mpa_v30_CHOCOPhlAn_201marker_info.txt | wc -l
# 1132198 # 113万行

一个物种有多个marker genes,因此总基因大于物种数

# 剖开看看
39444__GeneID:11747987  {'ext': [], 'score': 2.0, 'clade': 's__Cotia_virus', 'le
39444__GeneID:11747985  {'ext': [], 'score': 2.0, 'clade': 's__Cotia_virus', 'le
39444__GeneID:11747982  {'ext': ['PRJNA15142', 'PRJNA14595', 'PRJNA20981'], 'sco
39444__GeneID:11747981  {'ext': [], 'score': 0.0, 'clade': 's__Cotia_virus', 'le

bzip2压缩了20倍,牛皮呀呀呀呀呀呀

Win下载:基因组数据

tar -xvf mpa_v30_CHOCOPhlAn_201901.tar
rm mpa_v30_CHOCOPhlAn_201901.tar
bzip2 -d mpa_v30_CHOCOPhlAn_201901.fna.bz2
# 敲开kankan>1000373__GeneID:11569613
less -S mpa_v30_CHOCOPhlAn_201901.fna | grep "^>" | head
# 是基因的序列信息
>1000373__GeneID:11604940
>1000373__GeneID:11604942
>1000373__GeneID:11604944
>100053__V6HSA9__LEP1GSC062_4244 UniRef90_V6HSA9;k__Bacteria|p__Spirochaetes|c__Spirochaetia|o__Spirochaetia_unclassified|f__Leptospiraceae|g__Leptospira|s__Leptospira_alexanderi;GCA_000243815
>100053__V6HSV5__LEP1GSC062_0144 UniRef90_V6HSV5;k__Bacteria|p__Spirochaetes|c__Spirochaetia|o__Spirochaetia_unclassified|f__Leptospiraceae|g__Leptospira|s__Leptospira_alexanderi;GCA_000243815
>100053__V6HSX2__LEP1GSC062_1624 UniRef90_V6HSX2;k__Bacteria|p__Spirochaetes|c__Spirochaetia|o__Spirochaetia_unclassified|f__Leptospiraceae|g__Leptospira|s__Leptospira_alexanderi;GCA_000243815
>100053__V6HT46__LEP1GSC062_1702 UniRef90_V6HT46;k__Bacteria|p__Spirochaetes|c__Spirochaetia|o__Spirochaetia_unclassified|f__Leptospiraceae|g__Leptospira|s__Leptospira_alexanderi;GCA_000243815
>100053__V6HT56__LEP1GSC062_1778 UniRef90_V6HT56;k__Bacteria|p__Spirochaetes|c__Spirochaetia|o__Spirochaetia_unclassified|f__Leptospiraceae|g__Leptospira|s__Leptospira_alexanderi;GCA_000243815
>100053__V6HTZ5__LEP1GSC062_0133 UniRef90_V6HTZ5;k__Bacteria|p__Spirochaetes|c__Spirochaetia|o__Spirochaetia_unclassified|f__Leptospiraceae|g__Leptospira|s__Leptospira_alexanderi;GCA_000243815
less -S mpa_v30_CHOCOPhlAn_201901.fna | grep "^>" | awk -F" " '{print $2}' | sed -e '/^$/d' | awk -F";" '{print $2}' | uniq  | wc -l
# 10164 行(物种)

bowtie2建数据库索引

bowtie2-build --threads 8 \
-f mpa_v30_CHOCOPhlAn_201901.fna \
mpa_v30_CHOCOPhlAn_201901

metaphlan3注释物种

shell脚本:

#!/usr/bin/env bash
# 帮助文档
helpdoc(){
    cat <<EOF
Description:

    This is a help document
    - Plot circos

Usage:

    $0 -i <input file> -o <output file>

Option:

    -i    this is a input file/fold
    -o    this is a output file/fold
EOF
}

# 参数传递
while getopts ":i:o:" opt
do
    case $opt in
        i)
        infile=`echo $OPTARG`
        ;;
        o)
        outfile=`echo $OPTARG`
        ;;
        ?)
        echo "未知参数"
        exit 1;;
    esac
done

# 若无指定任何参数则输出帮助文档
if [ $# = 0 ]
then
    helpdoc
    exit 1
fi

# 核心
source /hwfssz5/ST_META/P18Z10200N0423_ZYQ/MiceGutProject/hutongyuan/software/miniconda3/etc/profile.d/conda.sh
conda activate metaphlan

route="/hwfssz5/ST_META/TMP1T.2/soil"
metaphlan3db="/hwfssz5/ST_META/P18Z10200N0423_ZYQ/MiceGutProject/hutongyuan/database/humann3/metaphlan_v30/"

mkdir $route/Result/metaphlan3/$infile
cat $route/Result/kneaddata/$infile/${infile}_1_kneaddata.repeats.removed.*.fastq > $route/Result/kneaddata/$infile/${infile}_1_kneaddata.merge.fastq

metaphlan $route/Result/kneaddata/$infile/${infile}_1_kneaddata.merge.fastq \
--input_type fastq \
--bowtie2db $metaphlan3db \
-x mpa_v30_CHOCOPhlAn_201901 \
--nproc 8 \
-o $route/Result/metaphlan3/$infile/${infile}_metaphlan3.out

rm $route/Result/kneaddata/$infile/${infile}_1_kneaddata.merge.fastq

--input_type {fastq,fasta,bowtie2out,sam}
--bowtie2db METAPHLAN_BOWTIE2_DB
-x INDEX Specify the id of the database version to use
--bt2_ps BowTie2 presets [default very-sensitive]
--add_viruses Allow the profiling of viral organisms
--ignore_eukaryotes Do not profile eukaryotic organisms
--ignore_bacteria Do not profile bacterial organisms
--ignore_archaea Do not profile archeal organisms
--nproc N The number of CPUs to use for parallelizing the mapping [default 4]

qsub提交任务:

route="/hwfssz5/ST_META/TMP1T.2/soil"
qsub -cwd -l vf=80G,p=8 -q st_supermem.q -P st_supermem -binding linear:8 $route/script/q_metaphlan3.sh -i 49

结果:

样本合并

conda activate metaphlan
python /route/metaphlan/bin/merge_metaphlan_tables.py \
-o ./abundance/abundance_merged.txt \
./abundance/*.out

unifrac样本距离

下载metaphlan nwk文件:
https://github.com/biobakery/MetaPhlAn/blob/master/metaphlan/utils/mpa_v30_CHOCOPhlAn_201901_species_tree.nwk

R安装依赖:

conda activate r411
conda install r-vegan r-ape r-rbiom

获取计算R脚本:
https://github.com/biobakery/MetaPhlAn/blob/master/metaphlan/utils/calculate_unifrac.R

计算距离:

Rscript $unifrac_route/calculate_unifrac.R \
abundance_merged.txt \
$nwk_route/mpa_v30_CHOCOPhlAn_201901_species_tree.nwk \
abundance_merged_unifrac_profiles.txt

警告nwk信息不全

WARNING: 4 species not present in the tree were removed from the input profile.
Nitrolancea_hollandica

Singulisphaera_acidiphila

Leptothrix_ochracea

Sutterella_parvirubra

hclust2.py画热图

安装:

conda activate -c bioconda hclust2

运行:

hclust2.py \
  -i abundance_merged.txt \
  -o hclust.sqrt_scale.png

热图太惨了,不要了

更多:
MetaPhlAn3安装及使用

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,142评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,298评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,068评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,081评论 1 291
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,099评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,071评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,990评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,832评论 0 273
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,274评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,488评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,649评论 1 347
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,378评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,979评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,625评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,796评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,643评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,545评论 2 352

推荐阅读更多精彩内容