文章
标题:Fast and accurate metagenotyping of the human gut microbiome with GT-Pro
译文:快速人肠道宏基因分型
期刊:NBT
时间:2022
Github: https://github.com/zjshi/gt-pro
安装
wget -c https://github.com/zjshi/gt-pro/releases/download/v1.0.1/gt_pro_prebuilt_v1.0.1.tar.gz
tar -zxvf gt_pro_prebuilt_v1.0.1.tar.gz
cd gt-pro-static
conda create -n gt-pro
source /hwfsxx1/ST_HN/P18Z10200N0423/huty/software/miniconda3_2/etc/profile.d/conda.sh
conda activate gt-pro
# dependency c++ env
# https://blog.csdn.net/cyr20040123/article/details/129479126
conda install gcc=11 -c conda-forge
conda install libgcc -c conda-forge
conda install gxx_linux-64 -c conda-forge
route="/hwfsxx1/ST_HN/P18Z10200N0423/huty/software/miniconda3_2/envs/gt-pro/bin"
ln -s $route/x86_64-conda_cos7-linux-gnu-g++ g++
# python3.8 pigz (optional)
conda install python=3.8
conda install pigz=2.6
# 自建库denpendency
conda install kmc -c bioconda
conda install biopython
conda install numpy
1 Build customized database
tar xzvf test/100035.tar.gz && tar xzvf test/101747.tar.gz && tar xzvf test/102779.tar.gz
./GT_Pro build \
--in test/build.list \
--out ./test/my_db \
--dbname tri_db --threads 4
过程
read ./test/100035
read ./test/101747
read ./test/102779
[OK] a total of 3 species found.
check ./test/100035 for required files
genome path found: ./test/100035/genomes/GUT_GENOME137233.fna
genome path found: ./test/100035/genomes/GUT_GENOME073777.fna
genome path found: ./test/100035/genomes/GUT_GENOME120930.fna
genome path found: ./test/100035/genomes/GUT_GENOME131088.fna
genome path found: ./test/100035/genomes/GUT_GENOME137819.fna
genome path found: ./test/100035/genomes/GUT_GENOME000139.fna
genome path found: ./test/100035/genomes/GUT_GENOME137257.fna
genome path found: ./test/100035/genomes/GUT_GENOME120554.fna
genome path found: ./test/100035/genomes/GUT_GENOME138280.fna
genome path found: ./test/100035/genomes/GUT_GENOME198444.fna
genome path found: ./test/100035/genomes/GUT_GENOME138765.fna
genome path found: ./test/100035/genomes/GUT_GENOME096021.fna
12 genomes sequences will be used for database building
[OK] ./test/100035/reference.fna found.
[OK] ./test/100035/msa.fa found.
[OK] ./test/100035/core_snps.vcf found.
check ./test/101747 for required files
genome path found: ./test/101747/genomes/GUT_GENOME190234.fna
genome path found: ./test/101747/genomes/GUT_GENOME253845.fna
genome path found: ./test/101747/genomes/GUT_GENOME107393.fna
genome path found: ./test/101747/genomes/GUT_GENOME185690.fna
genome path found: ./test/101747/genomes/GUT_GENOME188736.fna
genome path found: ./test/101747/genomes/GUT_GENOME092036.fna
genome path found: ./test/101747/genomes/GUT_GENOME093349.fna
genome path found: ./test/101747/genomes/GUT_GENOME194019.fna
genome path found: ./test/101747/genomes/GUT_GENOME105882.fna
genome path found: ./test/101747/genomes/GUT_GENOME094667.fna
genome path found: ./test/101747/genomes/GUT_GENOME185233.fna
genome path found: ./test/101747/genomes/GUT_GENOME107011.fna
12 genomes sequences will be used for database building
[OK] ./test/101747/reference.fna found.
[OK] ./test/101747/msa.fa found.
[OK] ./test/101747/core_snps.vcf found.
check ./test/102779 for required files
genome path found: ./test/102779/genomes/GUT_GENOME166581.fna
genome path found: ./test/102779/genomes/GUT_GENOME164547.fna
genome path found: ./test/102779/genomes/GUT_GENOME164615.fna
genome path found: ./test/102779/genomes/GUT_GENOME096682.fna
genome path found: ./test/102779/genomes/GUT_GENOME166279.fna
genome path found: ./test/102779/genomes/GUT_GENOME166503.fna
genome path found: ./test/102779/genomes/GUT_GENOME158726.fna
genome path found: ./test/102779/genomes/GUT_GENOME166574.fna
genome path found: ./test/102779/genomes/GUT_GENOME165490.fna
genome path found: ./test/102779/genomes/GUT_GENOME165777.fna
genome path found: ./test/102779/genomes/GUT_GENOME166618.fna
genome path found: ./test/102779/genomes/GUT_GENOME165550.fna
genome path found: ./test/102779/genomes/GUT_GENOME165783.fna
13 genomes sequences will be used for database building
[OK] ./test/102779/reference.fna found.
[OK] ./test/102779/msa.fa found.
[OK] ./test/102779/core_snps.vcf found.
[load] loading core-genome consensus sequence from ./test/100035/reference.f na
[load] loading core-genome consensus sequence from ./test/101747/reference.f na
[load] loading core-genome consensus sequence from ./test/102779/reference.f na
the loaded core-genome has a consensus sequence of 1919699 bases
[load] loading core snps from ./test/101747/core_snps.vcf
the loaded core-genome has a consensus sequence of 3160876 bases
[load] loading core snps from ./test/102779/core_snps.vcf
the loaded core-genome has a consensus sequence of 3626943 bases
[load] loading core snps from ./test/100035/core_snps.vcf
a total of 17386 core bi-allelic snps was found
[load] loading key coordinates on core-genome from ./test/100035/coords.tsv
a total of 705 divisions was found
a total of 34488 core bi-allelic snps was found
[load] loading key coordinates on core-genome from ./test/101747/coords.tsv
a total of 511 divisions was found
a total of 41034 core bi-allelic snps was found
[load] loading key coordinates on core-genome from ./test/102779/coords.tsv
a total of 297 divisions was found
[searching] start to search 31-mers
[searching] start to search 31-mers
[searching] start to search 31-mers
a total of 518260 kmer records was found
a total of 1038192 kmer records was found
a total of 1239348 kmer records was found
2 Optimize GT-Pro database before metagenotyping
./GT_Pro optimize \
-d test/my_db/tri_db \
-i test/SRR413665_2.fastq.gz
过程
[OK] start initial optimization
[OK] database found
[OK] optimize from test/my_db/tri_db.bin
[OK] initial optimization done
[OK] finalize optimization with a break-in test
[OK] optimization done
Quick start
# 1 下载构建好的数据库
# 1.1 20190723_881species_optimized_db_kmer_index.bin
# 1.2 20190723_881species_optimized_db_snps.bin
wget --content-disposition https://fileshare.czbiohub.org/s/daK4Wj3N7EwSSrd/download
wget --content-disposition https://fileshare.czbiohub.org/s/XNCzWziB4JydWFH/download
# 2 SNP字典
# variants_main.covered.hq.snp_dict.tsv
wget --content-disposition https://fileshare.czbiohub.org/s/cYDwCAB539kE5yt/download
# 3 物种信息
# species_taxonomy_ext.tsv
wget --content-disposition https://fileshare.czbiohub.org/s/XBzwFpfJpmJpBSQ/download
更多
GT-Pro——快速准确地对人体肠道微生物组进行宏基因组分型
【Nature Medicine】人类肠道微生物群中的细菌SNP与宿主BMI相关