circRNA的生信分析根据鉴定的方法不同可以分为以下两种:
split-alignment-based approaches:针对内含子驱动模式下的反向可变剪切接头序列(back-spliced junction)设计的预测软件,如 find_circ、CIRCexplorer、CIRI和 MapSplice等;
pseudoreference-based approaches:通过基因组注释信息推测得到反向可变剪切接头序列,然后与注释的外显子序列进行匹配,预测得到新 circRNA的软件,如 KNIFE、 NCLscan等。
CIRI2是一款使用BWA-MEM比对结果,支持基于BSJ de novo的检测。在CIRI基础上,改进了MLE模型,判断潜在BSJ read中是否多个可能区域,有效控制由错误mapping或者基因组中重复序列所导致的假阳性。同时,CIRI2在测试数据中F1得分平均值最高,且较其他识别程序消耗更少的内存及运算时间。CIRC2需求经RNase R处理的样本数据。
===下载====
地址:https://sourceforge.net/projects/ciri/files/CIRI2/
=====比对====
bwa mem -t 40 -T 19 Nitab-v4.5_genome_Scf_Edwards2017.fasta CK_0_1_1.fq.gz CK_0_1_2.fq.gz >CK_0_1.sam
注:-T 是输出结果比对得分的阈值,默认值是30,很多帖子解释通过大部分数据测试,发现19这个值最好,可以提高CIRI的敏感性。所以选择和大家一致的参数。
====鉴定====
perl CIRI2.pl -F Nitab-v4.5_genome_Scf_Edwards2017.fasta -I CK_0_1.sam -O CK_0_1 -T 30 -A Nitab-v4.5_gene_models_Scf_Edwards2017.gtf
其中:
-I, --in input SAM file name (required; generated by BWA-MEM)
-O, --out output circRNA list name (required)
-F, --ref_file FASTA file of all reference sequences
-A, --anno input GTF/GFF3 formatted annotation file name (optional)
输出结果如下图所示:
输出格式说明:
Column 1: circRNA_ID
Column 2: chromosome of a predicted circRNA
Column 3: circRNA_start
Column 4: circRNA_end
Column 5: circular junction read count of a predicted circRNA
Column 6: unique CIGAR types of a predicted circRNA. For example, a circRNAs have three junction reads: read A (80M20S, 80S20M), read B (80M20S, 80S20M), read C (40M60S, 40S30M30S, 70S30M), then its has two SM types (80S20M, 70S30M), two MS types (80M20S, 70M30S) and one SMS type (40S30M30S). Thus its SM_MS_SMS should be 2_2_1.
Column 7: non-junction read count of a predicted circRNA that mapped across the circular junction but consistent with linear RNA instead of being back-spliced
Column 8: ratio of circular junction reads calculated by 2*#junction_reads/(2*#junction_reads+#non_junction_reads). #junction_reads is multiplied by two because a junction read is generated from two ends of circular junction but only counted once while a non-junction read is from one end. It has to be mentioned that the non-junction reads are still possibly from another larger circRNA, so the junction_reads_ratio based on it may be an inaccurate estimation of relative expression of the circRNA.
Column 9: type of a circRNA according to positions of its two ends on chromosome (exon, intron or intergenic_region; only available when annotation file is provided)
Column 10: ID of the gene(s) where an exonic or intronic circRNA locates
Column 11: strand info of a predicted circRNAs (new in CIRI2)
Column 12: all of the circular junction read IDs (split by ",")
本文使用 文章同步助手 同步