【circRNA】circRNA鉴定-CIRI2篇

circRNA的生信分析根据鉴定的方法不同可以分为以下两种：

split-alignment-based approaches：针对内含子驱动模式下的反向可变剪切接头序列(back-spliced junction)设计的预测软件，如 find_circ、CIRCexplorer、CIRI和 MapSplice等；

pseudoreference-based approaches：通过基因组注释信息推测得到反向可变剪切接头序列，然后与注释的外显子序列进行匹配，预测得到新 circRNA的软件，如 KNIFE、 NCLscan等。

CIRI2是一款使用BWA-MEM比对结果，支持基于BSJ de novo的检测。在CIRI基础上，改进了MLE模型，判断潜在BSJ read中是否多个可能区域，有效控制由错误mapping或者基因组中重复序列所导致的假阳性。同时，CIRI2在测试数据中F1得分平均值最高，且较其他识别程序消耗更少的内存及运算时间。CIRC2需求经RNase R处理的样本数据。

===下载====

地址：https://sourceforge.net/projects/ciri/files/CIRI2/

=====比对====

bwa mem -t 40 -T 19 Nitab-v4.5_genome_Scf_Edwards2017.fasta CK_0_1_1.fq.gz CK_0_1_2.fq.gz >CK_0_1.sam

注：-T 是输出结果比对得分的阈值，默认值是30，很多帖子解释通过大部分数据测试，发现19这个值最好，可以提高CIRI的敏感性。所以选择和大家一致的参数。

====鉴定====

perl CIRI2.pl -F Nitab-v4.5_genome_Scf_Edwards2017.fasta -I CK_0_1.sam -O CK_0_1 -T 30 -A Nitab-v4.5_gene_models_Scf_Edwards2017.gtf

其中：

-I, --in input SAM file name (required; generated by BWA-MEM)

-O, --out output circRNA list name (required)

-F, --ref_file FASTA file of all reference sequences

-A, --anno input GTF/GFF3 formatted annotation file name (optional)

输出结果如下图所示：

输出格式说明：

Column 1: circRNA_ID

Column 2: chromosome of a predicted circRNA

Column 3: circRNA_start

Column 4: circRNA_end

Column 5: circular junction read count of a predicted circRNA

Column 6: unique CIGAR types of a predicted circRNA. For example, a circRNAs have three junction reads: read A (80M20S, 80S20M), read B (80M20S, 80S20M), read C (40M60S, 40S30M30S, 70S30M), then its has two SM types (80S20M, 70S30M), two MS types (80M20S, 70M30S) and one SMS type (40S30M30S). Thus its SM_MS_SMS should be 2_2_1.

Column 7: non-junction read count of a predicted circRNA that mapped across the circular junction but consistent with linear RNA instead of being back-spliced

Column 8: ratio of circular junction reads calculated by 2*#junction_reads/(2*#junction_reads+#non_junction_reads). #junction_reads is multiplied by two because a junction read is generated from two ends of circular junction but only counted once while a non-junction read is from one end. It has to be mentioned that the non-junction reads are still possibly from another larger circRNA, so the junction_reads_ratio based on it may be an inaccurate estimation of relative expression of the circRNA.

Column 9: type of a circRNA according to positions of its two ends on chromosome (exon, intron or intergenic_region; only available when annotation file is provided)

Column 10: ID of the gene(s) where an exonic or intronic circRNA locates

Column 11: strand info of a predicted circRNAs (new in CIRI2)

Column 12: all of the circular junction read IDs (split by ",")

本文使用文章同步助手同步

【circRNA】circRNA鉴定-CIRI2篇

推荐阅读更多精彩内容