细菌基因组分析软件--Bactopia

一、软件介绍:


1、文章信息:

Petit III RA, Read TD, Bactopia: a flexible pipeline for complete analysis of bacterial genomes. mSystems. 5 (2020), https://doi.org/10.1128/mSystems.00190-20.

2、软件相关介绍:

https://github.com/bactopia/bactopia

3、软件工作流程:


软件分析流程

4、主要功能

我觉得最大的特点是傻瓜,一步到位。以前的分析往往需要多步多软件进行。用完一个再用另外一个。比如:FastQC-Trimmomatic-Unicycler(SPAdes)-Prokka-blast against custom database。更麻烦的是需要经常写一些小脚本处理格式。总之很烦躁,还很难发好文章(血与泪的教训)。

该软件配置完成后可以一步到位,有木有觉得很激动,很爽?什么总结信息、提取16S序列构建进化树、物种分类、基于ANI来进行物种更细的分类(species/subspecies?)、泛基因组分析之类的一次性搞定。不知道正在准备搭建流程的公司看到这个有没有很激动。

文章里提供的1.4版本的软件列表

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

AMRFinder    3.6.7    Finds acquired antimicrobial resistance genes and some point mutations in protein or assembled nucleotide sequences

Aragorn    1.2.38    Finds transfer RNA (tRNA) features 

Ariba    2.14.4    Antimicrobial resistance identification by assembly

ART    2016.06.05    A set of simulation tools to generate synthetic next-generation sequencing reads 

assembly-scan    0.3.0    Generates basic stats for an assembly

Barrnap    0.9    Bacterial ribosomal RNA predictor

BBMap    38.76    A suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data

BCFtools    1.9    Utilities for variant calling and manipulating VCFs and BCFs

Bedtools    2.29.2    A powerful tool set for genome arithmetic

BioPython    1.76    Tools for biological computation written in Python 

BLAST    2.9.0    Basic local alignment search tool

Bowtie2    2.4.1    A fast and sensitive gapped-read aligner

BWA    0.7.17    Burrows-Wheeler Aligner for short-read alignment

CD-HIT    4.8.1    Accelerated for clustering the next-generation sequencing data 

CheckM    1.1.2    Assesses the quality of microbial genomes recovered from isolates, single cells, and metagenomes

ClonalFrameML1.12    Efficient inference of recombination in whole bacterial genomes 

DiagrammeR 1.0.0 Graph and network visualization using tabular data in R https://github.com/rich-iannone/DiagrammeR

DIAMOND 0.9.35 Accelerated BLAST-compatible local sequence aligner https://github.com/bbuchfink/diamond

eggNOG-Mapper    2.0.1    Fast genome-wide functional annotation through orthology assignment

EMIRGE    0.61.1    Reconstructs full-length ribosomal genes from short-read sequencing data

FastANI    1.3    Fast whole-genome similarity (ANI) estimation 

FastTree2    2.1.10    Approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences

fastq-dl    1.0.3    Downloads FASTQ files from SRA or ENA repositories

FastQC    0.11.9    A quality control analysis tool for high throughput sequencing data

fastq-scan    0.4.3    Outputs FASTQ summary statistics in JSON format

FLASH    1.2.11    A fast and accurate tool to merge paired-end reads

freebayes    1.3.2    Bayesian haplotype-based genetic polymorphism discovery and genotyping

GNU Parallel    20200122    A shell tool for executing jobs in parallel

GTDB-tk    1.0.2    A tool kit for assigning objective taxonomic classifications to bacterial and archaeal genomes

HMMER    3.3    Biosequence analysis using profile hidden Markov models

Infernal    1.1.2    Searches DNA sequence databases for RNA structure and sequence similarities

IQ-TREE    1.6.12    Efficient phylogenomic software by maximum likelihood

ISMapper    2.0    Insertion sequence mapping software

Lighter    1.1.2    Fast and memory-efficient sequencing error corrector

MAFFT    7.455    Multiple alignment program for amino acid or nucleotide sequences

Mash    2.2.2    Fast genome and metagenome distance estimation using MinHash

Mashtree    1.1.2    Creates a tree using Mash distances

maskrc-svg    0.5    Masks recombination as detected by ClonalFrameML or Gubbins and draws an SVG

McCortex    1.0    De novo genome assembly and multisample variant calling

MEGAHIT    1.2.9    Ultra-fast and memory-efficient (meta-)genome assembler

MinCED    0.4.2    Mining CRISPRs in environmental data sets

Minimap2    2.17    A versatile pairwise aligner for genomic and spliced nucleotide sequences

ncbi-genome-download    0.2.12    Scripts to download genomes from the NCBI FTP servers

Nextflow    19.10.0    A DSL for data-driven computational pipelines

phyloFlash    3.3b3    Rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of anIllumina (metagenomic data set)

Pigz    2.3.4    A parallel implementation of gzip for modern multiprocessor, multicore machines

Pilon    1.23    An automated genome assembly improvement and variant detection tool

PIRATE    1.0.3    A toolbox for pan-genome analysis and threshold evaluation

pplacer    1.1.alpha19    Phylogenetic placement and downstream analysis

Prodigal    2.6.3    Fast, reliable protein-coding gene prediction for prokaryotic genomes

Prokka    1.4.5    Rapid prokaryotic genome annotation

QUAST    5.0.2    Quality assessment tool for genome assemblies

Racon    1.4.13    Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads

Roary    3.13.0    Rapid large-scale prokaryote pan genome analysis

samclip    0.2    Filter SAM file for soft and hard clipped alignments

SAMtools    1.9    Tools for manipulating next-generation sequencing data

Seqtk    1.3    A fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Shovill    1.0.9se    Faster assembly of Illumina reads

SKESA    2.3.0    Strategic k-mer extension for scrupulous assemblies

Snippy    4.4.5    Rapid haploid variant calling and core genome alignment

SnpEff    4.3.1    Genomic variant annotations and functional effect prediction toolbox

snp-dists    0.6.3    Pairwise SNP distance matrix from a FASTA sequence alignment 

SNP-sites    2.5.1    Rapidly extracts SNPs from a multi-FASTA alignment

Sourmash    3.2.0    Compute and compare MinHash signatures for DNA data sets 

SPAdes    3.13.0    An assembly toolkit containing various assembly pipelines

Trimmomatic    0.39    A flexible read trimming tool for Illumina NGS data

Unicycler    0.4.8    Hybrid assembly pipeline for bacterial genomes

vcf-annotator    0.5    Add biological annotations to variants in a VCF file 

Vcflib    1.0.0rc3    A simple C library for parsing and manipulating VCF files

Velvet    1.2.10    Short read de novo assembler using de Bruijn graphs

VSEARCH    2.14.1    Versatile open-source tool for metagenomics

vt    2015.11.10    A tool set for short-variant discovery in genetic sequence data

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5. 软件使用

    5.1 软件安装

    conda create -y -n bactopia -c conda-forge -c bioconda bactopia

    conda activate bactopia

    bactopia datasets datasets/ #这里会下载到指定的目录‘datasets/',包含了CARD,VFDB(核心),RefSeq Mash Sketch,GenBank Sourmash Signatures, PLSDB Mash Sketch & BLAST。

    5.2 软件运行

    双端数据

bactopia --R1 ${SAMPLE}_R1.fastq.gz --R2 ${SAMPLE}_R2.fastq.gz --sample ${SAMPLE} \

        --datasets datasets/ --outdir ${OUTDIR}

    单端数据

bactopia --SE ${SAMPLE}.fastq.gz --sample ${SAMPLE} --datasets datasets/ --outdir ${OUTDIR}

    多样本

bactopia prepare directory-of-fastqs/ > fastqs.txt

bactopia --fastqs fastqs.txt --datasets datasets --outdir ${OUTDIR}

    ENA数据(真香)

bactopia --accessions ena-accessions.txt \

        --datasets datasets/ \

        --species "Staphylococcus aureus" \

        --coverage 100 \

        --genome_size median \

        --cpus 2 \

        --outdir ena-multiple-samples

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,530评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 86,403评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,120评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,770评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,758评论 5 367
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,649评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,021评论 3 398
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,675评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,931评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,659评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,751评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,410评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,004评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,969评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,042评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,493评论 2 343