一、软件介绍:
1、文章信息:
Petit III RA, Read TD, Bactopia: a flexible pipeline for complete analysis of bacterial genomes. mSystems. 5 (2020), https://doi.org/10.1128/mSystems.00190-20.
2、软件相关介绍:
https://github.com/bactopia/bactopia
3、软件工作流程:
4、主要功能
我觉得最大的特点是傻瓜,一步到位。以前的分析往往需要多步多软件进行。用完一个再用另外一个。比如:FastQC-Trimmomatic-Unicycler(SPAdes)-Prokka-blast against custom database。更麻烦的是需要经常写一些小脚本处理格式。总之很烦躁,还很难发好文章(血与泪的教训)。
该软件配置完成后可以一步到位,有木有觉得很激动,很爽?什么总结信息、提取16S序列构建进化树、物种分类、基于ANI来进行物种更细的分类(species/subspecies?)、泛基因组分析之类的一次性搞定。不知道正在准备搭建流程的公司看到这个有没有很激动。
文章里提供的1.4版本的软件列表
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AMRFinder 3.6.7 Finds acquired antimicrobial resistance genes and some point mutations in protein or assembled nucleotide sequences
Aragorn 1.2.38 Finds transfer RNA (tRNA) features
Ariba 2.14.4 Antimicrobial resistance identification by assembly
ART 2016.06.05 A set of simulation tools to generate synthetic next-generation sequencing reads
assembly-scan 0.3.0 Generates basic stats for an assembly
Barrnap 0.9 Bacterial ribosomal RNA predictor
BBMap 38.76 A suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data
BCFtools 1.9 Utilities for variant calling and manipulating VCFs and BCFs
Bedtools 2.29.2 A powerful tool set for genome arithmetic
BioPython 1.76 Tools for biological computation written in Python
BLAST 2.9.0 Basic local alignment search tool
Bowtie2 2.4.1 A fast and sensitive gapped-read aligner
BWA 0.7.17 Burrows-Wheeler Aligner for short-read alignment
CD-HIT 4.8.1 Accelerated for clustering the next-generation sequencing data
CheckM 1.1.2 Assesses the quality of microbial genomes recovered from isolates, single cells, and metagenomes
ClonalFrameML1.12 Efficient inference of recombination in whole bacterial genomes
DiagrammeR 1.0.0 Graph and network visualization using tabular data in R https://github.com/rich-iannone/DiagrammeR
DIAMOND 0.9.35 Accelerated BLAST-compatible local sequence aligner https://github.com/bbuchfink/diamond
eggNOG-Mapper 2.0.1 Fast genome-wide functional annotation through orthology assignment
EMIRGE 0.61.1 Reconstructs full-length ribosomal genes from short-read sequencing data
FastANI 1.3 Fast whole-genome similarity (ANI) estimation
FastTree2 2.1.10 Approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences
fastq-dl 1.0.3 Downloads FASTQ files from SRA or ENA repositories
FastQC 0.11.9 A quality control analysis tool for high throughput sequencing data
fastq-scan 0.4.3 Outputs FASTQ summary statistics in JSON format
FLASH 1.2.11 A fast and accurate tool to merge paired-end reads
freebayes 1.3.2 Bayesian haplotype-based genetic polymorphism discovery and genotyping
GNU Parallel 20200122 A shell tool for executing jobs in parallel
GTDB-tk 1.0.2 A tool kit for assigning objective taxonomic classifications to bacterial and archaeal genomes
HMMER 3.3 Biosequence analysis using profile hidden Markov models
Infernal 1.1.2 Searches DNA sequence databases for RNA structure and sequence similarities
IQ-TREE 1.6.12 Efficient phylogenomic software by maximum likelihood
ISMapper 2.0 Insertion sequence mapping software
Lighter 1.1.2 Fast and memory-efficient sequencing error corrector
MAFFT 7.455 Multiple alignment program for amino acid or nucleotide sequences
Mash 2.2.2 Fast genome and metagenome distance estimation using MinHash
Mashtree 1.1.2 Creates a tree using Mash distances
maskrc-svg 0.5 Masks recombination as detected by ClonalFrameML or Gubbins and draws an SVG
McCortex 1.0 De novo genome assembly and multisample variant calling
MEGAHIT 1.2.9 Ultra-fast and memory-efficient (meta-)genome assembler
MinCED 0.4.2 Mining CRISPRs in environmental data sets
Minimap2 2.17 A versatile pairwise aligner for genomic and spliced nucleotide sequences
ncbi-genome-download 0.2.12 Scripts to download genomes from the NCBI FTP servers
Nextflow 19.10.0 A DSL for data-driven computational pipelines
phyloFlash 3.3b3 Rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of anIllumina (metagenomic data set)
Pigz 2.3.4 A parallel implementation of gzip for modern multiprocessor, multicore machines
Pilon 1.23 An automated genome assembly improvement and variant detection tool
PIRATE 1.0.3 A toolbox for pan-genome analysis and threshold evaluation
pplacer 1.1.alpha19 Phylogenetic placement and downstream analysis
Prodigal 2.6.3 Fast, reliable protein-coding gene prediction for prokaryotic genomes
Prokka 1.4.5 Rapid prokaryotic genome annotation
QUAST 5.0.2 Quality assessment tool for genome assemblies
Racon 1.4.13 Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads
Roary 3.13.0 Rapid large-scale prokaryote pan genome analysis
samclip 0.2 Filter SAM file for soft and hard clipped alignments
SAMtools 1.9 Tools for manipulating next-generation sequencing data
Seqtk 1.3 A fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Shovill 1.0.9se Faster assembly of Illumina reads
SKESA 2.3.0 Strategic k-mer extension for scrupulous assemblies
Snippy 4.4.5 Rapid haploid variant calling and core genome alignment
SnpEff 4.3.1 Genomic variant annotations and functional effect prediction toolbox
snp-dists 0.6.3 Pairwise SNP distance matrix from a FASTA sequence alignment
SNP-sites 2.5.1 Rapidly extracts SNPs from a multi-FASTA alignment
Sourmash 3.2.0 Compute and compare MinHash signatures for DNA data sets
SPAdes 3.13.0 An assembly toolkit containing various assembly pipelines
Trimmomatic 0.39 A flexible read trimming tool for Illumina NGS data
Unicycler 0.4.8 Hybrid assembly pipeline for bacterial genomes
vcf-annotator 0.5 Add biological annotations to variants in a VCF file
Vcflib 1.0.0rc3 A simple C library for parsing and manipulating VCF files
Velvet 1.2.10 Short read de novo assembler using de Bruijn graphs
VSEARCH 2.14.1 Versatile open-source tool for metagenomics
vt 2015.11.10 A tool set for short-variant discovery in genetic sequence data
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
5. 软件使用
5.1 软件安装
conda create -y -n bactopia -c conda-forge -c bioconda bactopia
conda activate bactopia
bactopia datasets datasets/ #这里会下载到指定的目录‘datasets/',包含了CARD,VFDB(核心),RefSeq Mash Sketch,GenBank Sourmash Signatures, PLSDB Mash Sketch & BLAST。
5.2 软件运行
双端数据
bactopia --R1 ${SAMPLE}_R1.fastq.gz --R2 ${SAMPLE}_R2.fastq.gz --sample ${SAMPLE} \
--datasets datasets/ --outdir ${OUTDIR}
单端数据
bactopia --SE ${SAMPLE}.fastq.gz --sample ${SAMPLE} --datasets datasets/ --outdir ${OUTDIR}
多样本
bactopia prepare directory-of-fastqs/ > fastqs.txt
bactopia --fastqs fastqs.txt --datasets datasets --outdir ${OUTDIR}
ENA数据(真香)
bactopia --accessions ena-accessions.txt \
--datasets datasets/ \
--species "Staphylococcus aureus" \
--coverage 100 \
--genome_size median \
--cpus 2 \
--outdir ena-multiple-samples