提取5 primer 和3 primerUTR位置

可以用的着,在biostar https://www.biostars.org/p/77188/ 找到的解答。记录留存一下,不一定要用,参考解决思路

It's not that easy. The GTF only has a feature category called UTR. The end user has to figure out if it is a 5' or 3' UTR. Here is a reproducible example in R of categorising them. Note that some protein-coding transcripts have a 5' UTR and no 3' UTR, a 3' UTR and no 5' UTR, or have neither UTR. It is important to remember the GENCODE annotation is a collection of transcript fragments, not full-length models.

library(GenomicRanges) # From Bioconductor.

genes <- read.table("gencode.v17.annotation.gtf", sep = '\t', skip = 5, stringsAsFactors = FALSE)
whichCodingTranscripts <- genes[, 3] == "transcript" & grepl("transcript_type protein_coding", genes[, 9], fixed = TRUE)
proteinTranscripts <- genes[whichCodingTranscripts, ]
strands <- proteinTranscripts[, 7]
allFeaturesTranscripts <- gsub("transcript_id ", '', sapply(strsplit(genes[, 9], "; "), '[', 2))
proteinTranscriptsNames <- allFeaturesTranscripts[whichCodingTranscripts]
whichCDS <- genes[, 3] == "CDS" & allFeaturesTranscripts %in% proteinTranscriptsNames
transcriptsCDS <- genes[whichCDS, ]
transcriptsCDS <- split(GRanges(transcriptsCDS[, 1], IRanges(transcriptsCDS[, 4], transcriptsCDS[, 5]), transcriptsCDS[, 7]),
                    factor(allFeaturesTranscripts[whichCDS], levels = proteinTranscriptsNames))
firstCDS <- mapply(function(CDS, strand) {if(strand == '+') {CDS[1]} else {CDS[length(CDS)]}}, transcriptsCDS, strands)
lastCDS <-  mapply(function(CDS, strand) {if(strand == '+') {CDS[length(CDS)]} else {CDS[1]}}, transcriptsCDS, strands)
whichUTR <- genes[, 3] == "UTR" & allFeaturesTranscripts %in% proteinTranscriptsNames
transcriptsUTR <- genes[whichUTR, ]
transcriptsUTR <- split(GRanges(transcriptsUTR[, 1], IRanges(transcriptsUTR[, 4], transcriptsUTR[, 5]), transcriptsUTR[, 7]),
                    factor(allFeaturesTranscripts[whichUTR], levels = names(firstCDS)))

transcriptsUTR5 <- mapply(function(UTR, CDS, strand)
               {        
                 if(strand == '+') UTR[UTR < CDS[1]] else UTR[UTR > CDS[length(CDS)]]
               }, transcriptsUTR, firstCDS, as.list(strands), SIMPLIFY = FALSE)

transcriptsUTR3 <- mapply(function(UTR, CDS, strand)
               {        
                 if(strand == '+') UTR[UTR > CDS[length(CDS)]] else UTR[UTR < CDS[1]]
               }, transcriptsUTR, firstCDS, as.list(strands), SIMPLIFY = FALSE)

That shouldn't be too difficult to figure, given you have UTRs annotated.

5'-UTRs are those UTRs at the 5' end of a transcript
3'-UTRS are those at the 3' end of a transcript
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

友情链接更多精彩内容