List of RNA-Seq bioinformatics tools #源自维基百科

Quality control and filtering data

Quality assessment is essential to the overall comprehension of RNA-Seq, as well to guarantee that data are in the right format and suitable for the next analyses. Often, is necessary to filter data, removing low quality sequences, linkers, overrepresented sequences or noise to assure a coherent final result.

cutadapt cutadapt removes adapter sequences from next-generation sequencing data (Illumina, SOLiD and 454). It is used especially when the read length of the sequencing machine is longer than the sequenced molecule, like the microRNA case.

FastQC FastQC is a quality control tool for high-throughput sequence data (Babraham Institute) and is developed in Java. Import of data is possible from FastQ files, BAM or SAM format. This tool provides an overview to inform about problematic areas, summary graphs and tables to rapid assessment of data. Results are presented in HTML permanent reports. FastQC can be run as a stand alone application or it can be integrated into a larger pipeline solution. See also seqanswers/FastQC.

FASTX FASTX Toolkit is a set of command line tools to manipulate reads in files FASTA or FASTQformat. These commands make possible preprocess the files before mapping with tools like Bowtie. Some of the tasks allowed are: conversion from FASTQ to FASTA format, information about statistics of quality, removing sequencing adapters, filtering and cutting sequences based on quality or conversion DNA/RNA.

HTSeq HTSeq.

htSeqTools htSeqTools is a Bioconductor package able to perform quality control, processing of data and visualization. htSeqTools makes possible visualize sample correlations, to remove over-amplification artifacts, to assess enrichment efficiency, to correct strand bias and visualize hits.

RNA-SeQC RNA-SeQC is a tool with application in experiment design, process optimization and quality control before computational analysis. Essentially, provides three types of quality control: read counts (such as duplicate reads, mapped reads and mapped unique reads, rRNA reads, transcript-annotated reads, strand specificity), coverage (like mean coverage, mean coefficient of variation, 5’/3’ coverage, gaps in coverage, GC bias) and expression correlation (the tool provides RPKM-based estimation of expression levels). RNA-SeQC is implemented in Java and is not required installation, however can be run using the GenePattern web interface. The input could be one or more BAM files. HTML reports are generated as output.

RSeQC RSeQC analyzes diverse aspects of RNA-Seq experiments: sequence quality, sequencing depth, strand specificity, GC bias, read distribution over the genome structure and coverage uniformity. The input can be SAM, BAM, FASTA, BED files or Chromosome size file (two-column, plain text file). Visualization can be performed by genome browsers like UCSC, IGB and IGV. However, R scripts can also be used to visualization.

SAMStat SAMStat identifies problems and reports several statistics at different phases of the process. This tool evaluates unmapped, poorly and accurately mapped sequences independently to infer possible causes of poor mapping.

ShortRead ShortRead is a package provided in the R (programming language)/BioConductor environments and allows input, manipulation, quality assessment and output of next-generation sequencing data. This tool makes possible manipulation of data, such as filter solutions to remove reads based on predefined criteria. ShortRead could be complemented with several Bioconductor packages to further analysis and visualization solutions (BioStrings,BSgenome,IRanges, and so on). See also seqanswers/ShortRead.

TrimmomaticTrimmomatic performs trimming for Illumina platforms and works with FASTQ reads (single or pair-ended). Some of the tasks executed are: cut adapters, cut bases in optional positions based on quality thresholds, cut reads to a specific length, converts quality scores to Phred-33/64.

Alignment Tools

After control assessment, the first step of RNA-Seq analysis involves alignment(RNA-Seq alignment)of the sequenced reads to a reference genome (if available) or to a transcriptome database. See List of sequence alignment software and HTS Mappers.

Short (Unspliced) aligners

Short aligners are able to align continuous reads (not containing gaps result of splicing) to a genome of reference. Basically, there are two types: 

1) based on the Burrows-Wheeler transform method such as Bowtie and BWA

2) based on Seed-extend methods,Needleman-WunschorSmith-Waterman algorithms. 

The first group (Bowtie and BWA) is many times faster, however some tools of the second group, despite the time spent tend to be more sensitive, generating more reads correctly aligned.

BFAST BFAST aligns short reads to reference sequences and presents particular sensitivity towards errors, SNPs, insertions and deletions. BFAST works with the Smith-Waterman algorithm. See also seqanwers/BFAST.

Bowtie Bowtie is a fast short aligner using an algorithm based on the Burrows-Wheeler transform and the FM-index. Bowtie tolerates a small number of mismatches. See also seqanswers/Bowtie.

Burrows-Wheeler Aligner (BWA)BWAimplements two algorithms based onBurrows–Wheeler transform. The first algorithm is used with reads with low error rate (<3%). The second algorithm was designed to handle more errors and implements aSmith-Watermanstrategy. BWA allows mismatches and small gaps (insertions and deletions). The output is presented in SAM format. See alsoseqanswers/BWA.

Short Oligonucleotide Analysis Package (SOAP)SOAP.

GNUMAP GNUMAP performs alignment using a probabilistic Needleman-Wunsch algorithm. This tool is able to handle alignment in repetitive regions of a genome without losing information. The output of the program was developed to make possible easy visualization using available software.

Maq Maq first aligns reads to reference sequences and after performs a consensus stage. On the first stage performs only ungapped alignment and tolerates up to 3 mismatches. See also seqanswers/Maq.

Mosaik Mosaik. Mosaik is able to align reads containing short gaps usingSmith-Waterman algorithm, ideal to overcome SNPs, insertions and deletions. See alsoseqanswers/Mosaik.

NovoAlign NovoAlign(commercial) is a short aligner to the Illumina platform based on Needleman-Wunsch algorithm. Novoalign tolerates up to 8 mismatches per read, and up to 7bp of indels. It is able to deal with bisulphite data. Output in SAM format. See also                       seqanswers/NovoAlign.

RazerSRazerS. See alsoseqanswers/RazerS.

SEALSEALuses aMapReducemodel to produce distributed computing on clusters of computers. Seal uses BWA to perform alignment andPicard MarkDuplicatesto detection and duplicate read removal. See alsoseqanswers/SEAL.

SeqMapSeqMap. See alsoseqanswers/SeqMap.

SHRiMP SHRiMP employs two techniques to align short reads. Firstly, theq-gramfiltering technique based on multiple seeds identifies candidate regions. Secondly, these regions are investigated in detail using Smith-Waterman algorithm. See also seqanswers/SHRiMP.

Stampy Stampy  combines the sensitivity of hash tables and the speed of BWA. Stampy is prepared to alignment of reads containing sequence variation like insertions and deletions. It is able to deal with reads up to 4500 bases and presents the output in SAM format. See also seqanswers/Stampy.

ZOOM (commercial) ZOOM is a short aligner of the Illumina/Solexa 1G platform. ZOOM uses extended spaced seeds methodology building hash tables for the reads, and tolerates mismatches and insertions and deletions. See also seqanswers/ZOOM.

Spliced aligners

Many reads span exon-exon junctions and can not be aligned directly by Short aligners, thus different approaches were necessary. Some Spliced aligners employ Short aligners to align firstly unspliced/continuous reads (exon-first approach), and after follow a different strategy to align the rest containing spliced regions - normally the reads are split into smaller segments and mapped independently.

Aligners based on known splice junctions

In this case the detection of splice junctions is based on data available in databases about known junctions. This type of tools cannot identify novel splice junctions. Some of this data comes from other expression methods like expressed sequence tags(EST).

Erange Erange is a tool to alignment and data quantification to mammalian transcriptomes. See also seqanswers/Erange.

RNA-MATE RNA-MATE is a computational pipeline for alignment of data from Applied Biosystems SOLID system. Provides the possibility of quality control and trimming of reads. The genome alignments are performed using mapreadsand the splice junctions are identified based on a library of known exon-junction sequences. This tool allows visualization of alignments and tag counting. See also seqanswers/RNA-MATE.

RUM RUM performs alignment based on a pipeline, being able to manipulate reads with splice junctions, using Bowtie and Blat. The flowchart starts doing alignment against a genome and a transcriptome database executed by Bowtie. The next step is to perform alignment of unmapped sequences to the genome of reference using BLAT. In the final step all alignments are merged to get the final alignment. The input files can be in FASTA or FASTQ format. The output is presented in RUM and SAM format.

RNASEQR RNASEQR. See also seqanswers/RNASEQR.

De novo Splice Aligners

De novo Splice aligners allow the detection of new Splice junctions without previous annotated information. See also De novo Splice Aligners.

SpliceMap  SpliceMap. See also seqanswers/SpliceMap.

SplitSeek SplitSeek.

SuperSplat SuperSplat was developed to find all type of splice junctions. The algorithm splits each read in all possible two-chunk combinations in an iterative way, and alignment is tried to each chunck. Output in “Supersplat” format. See also seqanswers/SuperSplat.

MapSplice MapSplice. See also seqanswers/MapSplice.

TopHat TopHat[2] is prepared to find de novo junctions. TopHat aligns reads in two steps. Firstly, unspliced reads are aligned with Bowtie. After, the aligned reads are assembled with Maq resulting islands of sequences. Secondly, the splice junctions are determined based on the initially unmapped reads and the possible canonical donor and acceptor sites within the island sequences. See also seqanswers/TopHat.

QPALMA QPALMA predicts splice junctions supported on machine learning algorithms. In this case the training set is a set of spliced reads with quality information and already known alignments. See also seqanswers/QPALMA.

PALMapper PALMapper. See also seqanswers/PALMapper.

Pass Pass aligns gapped, ungapped reads and also bisulfite sequencing data. It includes the possibility to filter data before alignment (remotion of adapters). Pass uses Needleman-Wunsch and Smith-Waterman algorithms, and performs alignment in 3 stages: scanning positions of seed sequences in the genome, testing the contiguous regions and finally refining the alignment. See also seqanswers/Pass.

ContextMap ContextMap was developed to overcome some limitations of TopHat and MapSplice, such as resolution of ambiguities. The central idea of this tool is to consider reads in gene expression context, improving this way alignment accuracy. ContextMap can be used in stand-alone and supported by TopHat or MapSplice. In stand-alone mode aligns reads to a genome, to a transcriptome database or both.

HMMSplicer HMMSplicer can identify canonical and non-canonical splice junctions in short-reads. Firstly, unspliced reads are removed with Bowtie. After that, the remaining reads are one at a time divided in half, then each part is seeded against a genome and the exon borders are determined based on the Hidden Markov Model. A quality score is assigned to each junction, useful to detect false positive rates. See also seqanswers/HMMSplicer.

G.Mo.R-SeG.Mo.R-Se

STAR STAR is an ultrafast tool that employs “sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure”, detects canonical, non-canonical splices junctions and chimeric-fusion sequences. It is already adapted to align long reads (third-generation sequencing technologies). See also seqanswers/STAR.

Quantitative analysis

These tools calculate the abundance of each gene expressed in a RNA-Seq sample. See alsoQuantification models.

Alexa-Seq Alexa-Seq is a pipeline that makes possible to perform gene expression analysis, transcript specific expression analysis, exon junction expression and quantitative alternative analysis. Allows wide alternative expression visualization, statistics and graphs. See also seqanswers/Alexa-Seq.

MMSEQ MMSEQis a pipeline for estimating isoform expression and allelic imbalance in diploid organisms based on RNA-Seq. The pipeline employs tools like Bowtie, TopHat, ArrayExpressHTS and SAMtools. Also, edgeR or DESeq to perform differential expression. See also seqanswers/MMSEQ.

rQuant rQuant is a web service (Galaxy (computational biology)installation) that determines abundances of transcripts per gene locus, based onquadratic programming. rQuant is able to evaluate biases introduced by experimental conditions. A combination of tools is employed: PALMapper (reads alignment), mTiM and mGene (inference of new transcripts).

NSMAPNSMAPallows inference of isoforms as well estimation of expression levels, without annotated information. The exons are identified and splice junctions are detected using TopHat. All the possible isoforms are computed by combination of the detected exons.

eXpress eXpress.

ERANGE ERANGE. See also seqanswers/ERANGE.

NEUMA NEUMA.

MISO MISO. See also seqanswers/MISO.

RSEM RSEM. See also seqanswers/RSEM.

Cufflinks Cufflinks. See also seqanswers/Cufflinks.[3]

Differential expression

Tools designed to study the variability of genetic expression between samples. See acomparative study of differential expression.

BaySeq BaySeq. See also seqanswers/BaySeq.

Cuffdiff Cuffdiff.

DESeq DESeq. See also seqanswers/DESeq.

DEGSeq DEGSeq. See also seqanswers/DEGSeq.

EdgeR EdgeR is a R package for analysis of differential expression of data from DNA sequencing methods, like RNA-Seq, SAGE or ChIP-Seq data. edgeR employs statistical methods supported on negative binomial distribution as a model for count variability. See also seqanswers/EdgeR.

Limma

MyrnaMyrnais a pipeline tool that runs in a cloud environment (Elastic MapReduce) or in a unique computer for estimating differential gene expression in RNA-Seq datasets. Bowtie is employed for short read alignment and R algorithms for interval calculations, normalization, and statistical processing. See alsoseqanswers/Myrna.

NOISeqNOISeq. See alsoseqanswers/NOISeq.

ScottyScottyPerforms power analysis to estimate the number of replicates and depth of sequencing required to call differential expression.

TSPM

Statistical analysis

MultiExperiment Viewer (MeV)MeV. See alsoseqanswers/MeV.

Fusion genes/chimeras/translocation finders

Genome arrangements result of cancer can produce aberrant genetic modifications like fusions or translocations. Identification of these modifications play important role in carcinogenesis studies.

ChimeraScan ChimeraScan.

FusionCatcher FusionCatcher.

FusionHunter FusionHunteridentifies fusion transcripts without depending on already known annotations. It uses Bowtie as a first aligner and paired-end reads. See also seqanswers/FusionHunter.

FusionSeq FusionSeq. See also seqanswers/FusionSeq.

SOAPFuse SOAPFuse.

TopHat-Fusion TopHat-Fusionis based on TopHat version and was developed to handle reads resulting from fusion genes. It does not require previous data about known genes and uses Bowtie to align continuous reads.. See also seqanswers/TopHat-Fusion.

FusionMap FusionMap.

Copy Number Variations identification

CNVseq CNVseq detects copy number variations supported on a statistical model derived from array-comparative genomic hybridization. Sequences alignment are performed by BLAT, calculations are executed by R modules and is fully automated using Perl. See also seqanswers/CNVseq.

CnvHMM

RNA-Seq simulators

Flux simulator Flux Simulator. See also seqanswers/Flux.

RNASeqReadSimulator RNASeqReadSimulator.

RSEM Read Simulator rsem-simulate-reads.

BEERS Simulator BEERS is formatted to mouse or human data, and paired-end reads sequenced on Illumina platform. Beers generates reads starting from a pool of gene models coming from different published annotation origins. Some genes are chosen randomly and afterwards are introduced deliberately errors (like indels, base changes and low quality tails), followed by construction of novel splice junctions.

Transcriptome assemblers

Genome-Guided assemblers

Scripture Scripture. See also seqanswers/Scripture.

IsoInfer IsoInfer.

IsoLasso IsoLasso.

Genome-Independent assemblers

KISSPLICE KISSPLICE.

Oases Oases. See also seqanswers/Oases.

SOAPdenovo SOAPdenovo. See also seqanswers/SOAPdenovo.

Trans-ABySS Trans-AByss. See also seqanswers/Trans-ABySS.

TrinityTrinity. See also seqanswers/Trinity.

Velvet Velvet (algorithm).[4]Velvet(EMBL-EBI). See also seqanswers/Velvet.

Visualization tools

Artemis Artemis.

Apollo Apollo.

EagleView EagleView

Integrated Genome Browser IGB

Integrative Genomics Viewer (IGV) IGV

MapView MapView

Tablet Tablet

Savant Savant

Samscope Samscope

SeqMonk SeqMonk. See also seqanswers/SeqMonk.

VespaVespa

Functional, Network & Pathway Analysis Tools

Ingenuity Systems (commercial) iReport & IPA: Ingenuity’s IPA and iReport applications enable you to upload, analyze, and visualize RNA-Seq datasets, eliminating the obstacles between data and biological insight. Both IPA and iReport support identification, analysis and interpretation of differentially expressed isoforms between condition and control samples, and support interpretation and assessment of expression changes in the context of biological processes, disease and cellular phenotypes, and molecular interactions. Ingenuity iReport supports the upload of native Cuffdiff file format as well as gene expression lists. IPA supports the upload of gene expression lists.

Workbench (analysis pipeline / integrated solutions)

ArrayExpressHTS ArrayExpressHTS (and ebi_ArrayExpressHTS) is a BioConductor package that allows preprocessing, quality assessment and estimation of expression of RNA-Seq datasets. It can be run remotely at the European Bioinformatics Institute cloud or locally. The package makes use of several tools: ShortRead (quality control), Bowtie, TopHat or BWA (alignment to a reference genome), SAMtools format, Cufflinks or MMSEQ (expression estimation). See also seqanswers/ArrayExpressHTS.

CLC bio (commercial) CLC bio

DNASTAR (commercial) DNASTAR

easyRNASeq easyRNASeq

GENE-Counter GENE-Counter

Galaxy: Galaxy is a general purpose workbench platform for computational biology. There are several publicly accessible Galaxy servers that support RNA-Seq tools and workflows, includingNBIC'sAndromeda, the CBIIT-Giga server, the Galaxy Project's public server, the Gene Network Galaxy server, the University of Oslo's Genomic Hyper browser,URGI's server(which supports S-MART), and many others.

GenePattern GenePattern offers integrated solutions to RNA-Seq analysis (Broad Institute).

Partek (commercial) Partek

NextGENe (commercial) NextGENe

RobiNA RobiNA

S-MART S-MARThandles mapped RNA-Seq data, and performs essentially data manipulation (selection/exclusion of reads, clustering and differential expression analysis) and visualization (read information, distribution, comparison with epigenomic ChIP-Seq data). It can be run on any laptop by a person without computer background. A friendly graphycal user interface makes easy the operation of the tools. See also seqanswers/S-MART.

wapRNA wapRNA

BiNGS!SL-seq

Further annotation tools for RNA-Seq data

seq2HLA seq2HLA is an annotation tool for obtaining an individual's HLA class I and II type and expression using standard NGS RNA-Seq data in fastq format. It comprises mapping RNA-Seq reads against a reference database of HLA alleles using bowtie, determining and reporting HLA type, confidence score and locus-specific expression level. This tool is developed in Python and R. It is available as console tool or Galaxy module. See also seqanswers/seq2HLA.

HLAminer HLAminer is a computational method for identifying HLA alleles directly from whole genome, exome and transcriptome shotgun sequence datasets. HLA allele predictions are derived by targeted assembly of shotgun sequence data and comparison to a database of reference allele sequences. This tool is developed in perl and it is available as console tool.

Webinars and Presentations

RNASeq-Blog Presentations

RNA-Seq Workshop Documentation (UC Davis University)

VIDEO: Strategies for Identifying Biologically Compelling Genes from Breast Cancer Subtype RNA-Seq ProfileswithAccompanying Analysis

Princeton Workshop

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,362评论 5 477
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,330评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,247评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,560评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,580评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,569评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,929评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,587评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,840评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,596评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,678评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,366评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,945评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,929评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,165评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 43,271评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,403评论 2 342

推荐阅读更多精彩内容

  • Introduction What is Bowtie 2? Bowtie 2 is an ultrafast a...
    wzz阅读 5,560评论 0 5
  • 去乔家大院和平遥古城的时候,游人特别多而且天气太热,阳光太刺眼,许多想要拍的角度都拍不出想要的效果,所以两个地方只...
    小小小小闹阅读 448评论 1 5
  • 一个好的想法想要落地,开花结果,可能需要的时间会很长!长到很多人都选择了放弃。比如,早在四年前,由于会议室使用比较...
    烦人的昵称阅读 212评论 0 0
  • 旅行,开拓视野,不能简单为了看风景而旅行。 假期也不能忘了学习充实自己,大学的话可选择的课程,阅读的书籍就多了起来...
    一半是乌云一半是蓝天阅读 295评论 0 0
  • 生活中的每一天,从日出到日落 走过 路过 看过 听过。 如同骑自行车一般,保持平衡又快又稳,正确的方向正...
    悟的阅读 256评论 0 0