文章
标题：Fast and accurate metagenotyping of the human gut microbiome with GT-Pro
译文：快速人肠道宏基因分型
期刊：NBT
时间：2022

计算表现

accuracy evaluation

安装

wget -c https://github.com/zjshi/gt-pro/releases/download/v1.0.1/gt_pro_prebuilt_v1.0.1.tar.gz
tar -zxvf gt_pro_prebuilt_v1.0.1.tar.gz
cd gt-pro-static

conda create -n gt-pro
source /hwfsxx1/ST_HN/P18Z10200N0423/huty/software/miniconda3_2/etc/profile.d/conda.sh
conda activate gt-pro
# dependency c++ env
# https://blog.csdn.net/cyr20040123/article/details/129479126
conda install gcc=11 -c conda-forge
conda install libgcc -c conda-forge
conda install gxx_linux-64 -c conda-forge
route="/hwfsxx1/ST_HN/P18Z10200N0423/huty/software/miniconda3_2/envs/gt-pro/bin"
ln -s $route/x86_64-conda_cos7-linux-gnu-g++ g++
# python3.8 pigz (optional)
conda install python=3.8 
conda install pigz=2.6
# 自建库denpendency
conda install kmc -c bioconda
conda install biopython
conda install numpy

1 Build customized database

tar xzvf test/100035.tar.gz && tar xzvf test/101747.tar.gz && tar xzvf test/102779.tar.gz
./GT_Pro build \
--in test/build.list \
--out ./test/my_db \
--dbname tri_db --threads 4

过程

read ./test/100035
read ./test/101747
read ./test/102779
[OK] a total of 3 species found.
check ./test/100035 for required files
        genome path found: ./test/100035/genomes/GUT_GENOME137233.fna
        genome path found: ./test/100035/genomes/GUT_GENOME073777.fna
        genome path found: ./test/100035/genomes/GUT_GENOME120930.fna
        genome path found: ./test/100035/genomes/GUT_GENOME131088.fna
        genome path found: ./test/100035/genomes/GUT_GENOME137819.fna
        genome path found: ./test/100035/genomes/GUT_GENOME000139.fna
        genome path found: ./test/100035/genomes/GUT_GENOME137257.fna
        genome path found: ./test/100035/genomes/GUT_GENOME120554.fna
        genome path found: ./test/100035/genomes/GUT_GENOME138280.fna
        genome path found: ./test/100035/genomes/GUT_GENOME198444.fna
        genome path found: ./test/100035/genomes/GUT_GENOME138765.fna
        genome path found: ./test/100035/genomes/GUT_GENOME096021.fna
12 genomes sequences will be used for database building
[OK] ./test/100035/reference.fna found.
[OK] ./test/100035/msa.fa found.
[OK] ./test/100035/core_snps.vcf found.
check ./test/101747 for required files
        genome path found: ./test/101747/genomes/GUT_GENOME190234.fna
        genome path found: ./test/101747/genomes/GUT_GENOME253845.fna
        genome path found: ./test/101747/genomes/GUT_GENOME107393.fna
        genome path found: ./test/101747/genomes/GUT_GENOME185690.fna
        genome path found: ./test/101747/genomes/GUT_GENOME188736.fna
        genome path found: ./test/101747/genomes/GUT_GENOME092036.fna
        genome path found: ./test/101747/genomes/GUT_GENOME093349.fna
        genome path found: ./test/101747/genomes/GUT_GENOME194019.fna
        genome path found: ./test/101747/genomes/GUT_GENOME105882.fna
        genome path found: ./test/101747/genomes/GUT_GENOME094667.fna
        genome path found: ./test/101747/genomes/GUT_GENOME185233.fna
        genome path found: ./test/101747/genomes/GUT_GENOME107011.fna
12 genomes sequences will be used for database building
[OK] ./test/101747/reference.fna found.
[OK] ./test/101747/msa.fa found.
[OK] ./test/101747/core_snps.vcf found.
check ./test/102779 for required files
        genome path found: ./test/102779/genomes/GUT_GENOME166581.fna
        genome path found: ./test/102779/genomes/GUT_GENOME164547.fna
        genome path found: ./test/102779/genomes/GUT_GENOME164615.fna
        genome path found: ./test/102779/genomes/GUT_GENOME096682.fna
        genome path found: ./test/102779/genomes/GUT_GENOME166279.fna
        genome path found: ./test/102779/genomes/GUT_GENOME166503.fna
        genome path found: ./test/102779/genomes/GUT_GENOME158726.fna
        genome path found: ./test/102779/genomes/GUT_GENOME166574.fna
        genome path found: ./test/102779/genomes/GUT_GENOME165490.fna
        genome path found: ./test/102779/genomes/GUT_GENOME165777.fna
        genome path found: ./test/102779/genomes/GUT_GENOME166618.fna
        genome path found: ./test/102779/genomes/GUT_GENOME165550.fna
        genome path found: ./test/102779/genomes/GUT_GENOME165783.fna
13 genomes sequences will be used for database building
[OK] ./test/102779/reference.fna found.
[OK] ./test/102779/msa.fa found.
[OK] ./test/102779/core_snps.vcf found.
[load] loading core-genome consensus sequence from ./test/100035/reference.f                                                                                 na
[load] loading core-genome consensus sequence from ./test/101747/reference.f                                                                                 na
[load] loading core-genome consensus sequence from ./test/102779/reference.f                                                                                 na
        the loaded core-genome has a consensus sequence of 1919699 bases
[load] loading core snps from ./test/101747/core_snps.vcf
        the loaded core-genome has a consensus sequence of 3160876 bases
[load] loading core snps from ./test/102779/core_snps.vcf
        the loaded core-genome has a consensus sequence of 3626943 bases
[load] loading core snps from ./test/100035/core_snps.vcf
        a total of 17386 core bi-allelic snps was found
[load] loading key coordinates on core-genome from ./test/100035/coords.tsv
        a total of 705 divisions was found
        a total of 34488 core bi-allelic snps was found
[load] loading key coordinates on core-genome from ./test/101747/coords.tsv
        a total of 511 divisions was found
        a total of 41034 core bi-allelic snps was found
[load] loading key coordinates on core-genome from ./test/102779/coords.tsv
        a total of 297 divisions was found
[searching] start to search 31-mers
[searching] start to search 31-mers
[searching] start to search 31-mers
        a total of 518260 kmer records was found
        a total of 1038192 kmer records was found
        a total of 1239348 kmer records was found

2 Optimize GT-Pro database before metagenotyping

./GT_Pro optimize \
-d test/my_db/tri_db \
-i test/SRR413665_2.fastq.gz

过程

[OK] start initial optimization
[OK] database found
[OK] optimize from test/my_db/tri_db.bin
[OK] initial optimization done
[OK] finalize optimization with a break-in test
[OK] optimization done

新增4个文件

Quick start

# 1 下载构建好的数据库
# 1.1 20190723_881species_optimized_db_kmer_index.bin
# 1.2 20190723_881species_optimized_db_snps.bin
wget --content-disposition https://fileshare.czbiohub.org/s/daK4Wj3N7EwSSrd/download
wget --content-disposition https://fileshare.czbiohub.org/s/XNCzWziB4JydWFH/download
# 2 SNP字典
# variants_main.covered.hq.snp_dict.tsv
wget --content-disposition https://fileshare.czbiohub.org/s/cYDwCAB539kE5yt/download
# 3 物种信息
# species_taxonomy_ext.tsv
wget --content-disposition https://fileshare.czbiohub.org/s/XBzwFpfJpmJpBSQ/download

更多
GT-Pro——快速准确地对人体肠道微生物组进行宏基因组分型
【Nature Medicine】人类肠道微生物群中的细菌SNP与宿主BMI相关

meta SNV gt-pro

meta SNV gt-pro

Quick start

相关阅读更多精彩内容

友情链接更多精彩内容