在单细胞cellranger软件建库的时候必须要用GTF文件,但是有时候有些基因组只有GFF,这时候就需要转换一下格式。
1、GTF的格式类型
GTF3 (9 feature types accepted): gene, transcript, exon, CDS, Selenoproteine, start_codon, stop_codon, three_prime_utr and five_prime_utr
GTF2.5 (8 feature types accepted): gene, transcript, exon, CDS, UTR, start_codon, stop_codon, Selenoproteine
GTF2.2 (9 feature types accepted): CDS, start_codon, stop_codon, 5UTR, 3UTR, inter, intron_CNS, intron CNS and exon
GTF2.1 (6 feature types accepted): CDS, start_codon, stop_codon, exon, 5UTR, 3UTR
GTF2 (4 feature types accepted): CDS, start_codon, stop_codon, exon
GTF1 (5 feature types accepted): CDS, start_codon, stop_codon, exon, intron
2、格式转换
1)、gffread
这里使用的版本是gffread v0.12.7
gffread test.gff3 -T -o test.gtf
这种方法输出的gtf版本是2.2,结果大概长这样:
2)、agat_convert_sp_gff2gtf.pl from AGAT
#下载https://github.com/NBISweden/AGAT
# 这里我直接下载的镜像使用
singularity pull docker://quay.io/biocontainers/agat:1.0.0--pl5321hdfd78af_0
singularity exec /path/to/Software/agat_1.0.0--pl5321hdfd78af_0.sif agat_convert_sp_gff2gtf.pl --gff test.gff3 -o test.gtf
默认输出的就是GTF3的格式,结果如下:
3)、其他(未尝试)
genome tools
http://genometools.org/tools/gt_gff3_to_gtf.html
ea-utils
https://github.com/ExpressionAnalysis/ea-utils/blob/master/clipper/gff2gtf
pasa
https://github.com/PASApipeline/PASApipeline/blob/master/misc_utilities/gff3_to_gtf_format.pl
kent utils:
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
gff3ToGenePred followed by genePredToGtf
GFFtools-GX
https://github.com/vipints/GFFtools-GX/blob/master/gff_to_gtf.py