meta SNV gt-pro

文章
标题:Fast and accurate metagenotyping of the human gut microbiome with GT-Pro
译文:快速人肠道宏基因分型
期刊:NBT
时间:2022

计算表现
accuracy evaluation

Github: https://github.com/zjshi/gt-pro

安装

wget -c https://github.com/zjshi/gt-pro/releases/download/v1.0.1/gt_pro_prebuilt_v1.0.1.tar.gz
tar -zxvf gt_pro_prebuilt_v1.0.1.tar.gz
cd gt-pro-static

conda create -n gt-pro
source /hwfsxx1/ST_HN/P18Z10200N0423/huty/software/miniconda3_2/etc/profile.d/conda.sh
conda activate gt-pro
# dependency c++ env
# https://blog.csdn.net/cyr20040123/article/details/129479126
conda install gcc=11 -c conda-forge
conda install libgcc -c conda-forge
conda install gxx_linux-64 -c conda-forge
route="/hwfsxx1/ST_HN/P18Z10200N0423/huty/software/miniconda3_2/envs/gt-pro/bin"
ln -s $route/x86_64-conda_cos7-linux-gnu-g++ g++
# python3.8 pigz (optional)
conda install python=3.8 
conda install pigz=2.6
# 自建库denpendency
conda install kmc -c bioconda
conda install biopython
conda install numpy

1 Build customized database

tar xzvf test/100035.tar.gz && tar xzvf test/101747.tar.gz && tar xzvf test/102779.tar.gz
./GT_Pro build \
--in test/build.list \
--out ./test/my_db \
--dbname tri_db --threads 4

过程

read ./test/100035
read ./test/101747
read ./test/102779
[OK] a total of 3 species found.
check ./test/100035 for required files
        genome path found: ./test/100035/genomes/GUT_GENOME137233.fna
        genome path found: ./test/100035/genomes/GUT_GENOME073777.fna
        genome path found: ./test/100035/genomes/GUT_GENOME120930.fna
        genome path found: ./test/100035/genomes/GUT_GENOME131088.fna
        genome path found: ./test/100035/genomes/GUT_GENOME137819.fna
        genome path found: ./test/100035/genomes/GUT_GENOME000139.fna
        genome path found: ./test/100035/genomes/GUT_GENOME137257.fna
        genome path found: ./test/100035/genomes/GUT_GENOME120554.fna
        genome path found: ./test/100035/genomes/GUT_GENOME138280.fna
        genome path found: ./test/100035/genomes/GUT_GENOME198444.fna
        genome path found: ./test/100035/genomes/GUT_GENOME138765.fna
        genome path found: ./test/100035/genomes/GUT_GENOME096021.fna
12 genomes sequences will be used for database building
[OK] ./test/100035/reference.fna found.
[OK] ./test/100035/msa.fa found.
[OK] ./test/100035/core_snps.vcf found.
check ./test/101747 for required files
        genome path found: ./test/101747/genomes/GUT_GENOME190234.fna
        genome path found: ./test/101747/genomes/GUT_GENOME253845.fna
        genome path found: ./test/101747/genomes/GUT_GENOME107393.fna
        genome path found: ./test/101747/genomes/GUT_GENOME185690.fna
        genome path found: ./test/101747/genomes/GUT_GENOME188736.fna
        genome path found: ./test/101747/genomes/GUT_GENOME092036.fna
        genome path found: ./test/101747/genomes/GUT_GENOME093349.fna
        genome path found: ./test/101747/genomes/GUT_GENOME194019.fna
        genome path found: ./test/101747/genomes/GUT_GENOME105882.fna
        genome path found: ./test/101747/genomes/GUT_GENOME094667.fna
        genome path found: ./test/101747/genomes/GUT_GENOME185233.fna
        genome path found: ./test/101747/genomes/GUT_GENOME107011.fna
12 genomes sequences will be used for database building
[OK] ./test/101747/reference.fna found.
[OK] ./test/101747/msa.fa found.
[OK] ./test/101747/core_snps.vcf found.
check ./test/102779 for required files
        genome path found: ./test/102779/genomes/GUT_GENOME166581.fna
        genome path found: ./test/102779/genomes/GUT_GENOME164547.fna
        genome path found: ./test/102779/genomes/GUT_GENOME164615.fna
        genome path found: ./test/102779/genomes/GUT_GENOME096682.fna
        genome path found: ./test/102779/genomes/GUT_GENOME166279.fna
        genome path found: ./test/102779/genomes/GUT_GENOME166503.fna
        genome path found: ./test/102779/genomes/GUT_GENOME158726.fna
        genome path found: ./test/102779/genomes/GUT_GENOME166574.fna
        genome path found: ./test/102779/genomes/GUT_GENOME165490.fna
        genome path found: ./test/102779/genomes/GUT_GENOME165777.fna
        genome path found: ./test/102779/genomes/GUT_GENOME166618.fna
        genome path found: ./test/102779/genomes/GUT_GENOME165550.fna
        genome path found: ./test/102779/genomes/GUT_GENOME165783.fna
13 genomes sequences will be used for database building
[OK] ./test/102779/reference.fna found.
[OK] ./test/102779/msa.fa found.
[OK] ./test/102779/core_snps.vcf found.
[load] loading core-genome consensus sequence from ./test/100035/reference.f                                                                                 na
[load] loading core-genome consensus sequence from ./test/101747/reference.f                                                                                 na
[load] loading core-genome consensus sequence from ./test/102779/reference.f                                                                                 na
        the loaded core-genome has a consensus sequence of 1919699 bases
[load] loading core snps from ./test/101747/core_snps.vcf
        the loaded core-genome has a consensus sequence of 3160876 bases
[load] loading core snps from ./test/102779/core_snps.vcf
        the loaded core-genome has a consensus sequence of 3626943 bases
[load] loading core snps from ./test/100035/core_snps.vcf
        a total of 17386 core bi-allelic snps was found
[load] loading key coordinates on core-genome from ./test/100035/coords.tsv
        a total of 705 divisions was found
        a total of 34488 core bi-allelic snps was found
[load] loading key coordinates on core-genome from ./test/101747/coords.tsv
        a total of 511 divisions was found
        a total of 41034 core bi-allelic snps was found
[load] loading key coordinates on core-genome from ./test/102779/coords.tsv
        a total of 297 divisions was found
[searching] start to search 31-mers
[searching] start to search 31-mers
[searching] start to search 31-mers
        a total of 518260 kmer records was found
        a total of 1038192 kmer records was found
        a total of 1239348 kmer records was found

2 Optimize GT-Pro database before metagenotyping

./GT_Pro optimize \
-d test/my_db/tri_db \
-i test/SRR413665_2.fastq.gz

过程

[OK] start initial optimization
[OK] database found
[OK] optimize from test/my_db/tri_db.bin
[OK] initial optimization done
[OK] finalize optimization with a break-in test
[OK] optimization done
新增4个文件

Quick start

# 1 下载构建好的数据库
# 1.1 20190723_881species_optimized_db_kmer_index.bin
# 1.2 20190723_881species_optimized_db_snps.bin
wget --content-disposition https://fileshare.czbiohub.org/s/daK4Wj3N7EwSSrd/download
wget --content-disposition https://fileshare.czbiohub.org/s/XNCzWziB4JydWFH/download
# 2 SNP字典
# variants_main.covered.hq.snp_dict.tsv
wget --content-disposition https://fileshare.czbiohub.org/s/cYDwCAB539kE5yt/download
# 3 物种信息
# species_taxonomy_ext.tsv
wget --content-disposition https://fileshare.czbiohub.org/s/XBzwFpfJpmJpBSQ/download

更多
GT-Pro——快速准确地对人体肠道微生物组进行宏基因组分型
【Nature Medicine】人类肠道微生物群中的细菌SNP与宿主BMI相关

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,142评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,298评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,068评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,081评论 1 291
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,099评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,071评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,990评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,832评论 0 273
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,274评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,488评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,649评论 1 347
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,378评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,979评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,625评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,796评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,643评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,545评论 2 352

推荐阅读更多精彩内容