转载|VCF变异统计及绘制染色体分布图

<meta charset="utf-8">

VCF文件的变异分布图,除了使用Circos图之外,统计特定材料间的变异分布情况时可能会用到染色体分布图,为了节省时间,实现统计及绘图的自动化,现提供统计数量用的Python脚本和绘图使用的R脚本,仅供参考。如有其它更快捷的方法,或觉得脚本太拙劣,欢迎留言批评交流。

image

PYTHON脚本,使用脚本整理100Kb滑窗中变异位点数量:

variation_number_in_bin.py

import sys

input_file = sys.argv[1]
step_len = 100000
result = {}

chrom_stats = {
"Chr1":43270923,
"Chr2":35937250,
"Chr3":36413819,
"Chr4":35502694,
"Chr5":29958434,
"Chr6":31248787,
"Chr7":29697621,
"Chr8":28443022,
"Chr9":23012720,
"Chr10":23207287,
"Chr11":29021106,
"Chr12":27531856
}

for chrom in chrom_stats.keys():
    step= 0
    step_start = 0
    result[chrom]={}
    result[chrom][step]=0
    for line in open(input_file):
        if line.startswith(chrom):
            pos = line.split()[1]
            if step_start < int(pos) < (step_start + step_len):
                result[chrom][step] += 1
            if int(pos) >= (step_start + step_len):
                step += 1
                step_start += step_len
                result[chrom][step] = 0
                result[chrom][step] += 1

    for steps,nums in result[chrom].items():
        print(chrom, steps, nums, sep="\t")
image

下面画图,使用R脚本如下:

 library(ggplot2)
 library(Cairo)

 var_density <- read.table("snp.distribution.stats", sep = "\t", header = T)

 names(var_density) <- c("chrom", "region", "variation_count")
 var_density$chrom <- factor(var_density$chrom, levels=c(
   "Chr1", "Chr2", "Chr3", "Chr4", "Chr5", "Chr6",
   "Chr7", "Chr8", "Chr9", "Chr10", "Chr11", "Chr12"))

 CairoPNG(filename = "variation_distribution.png", width = 12000, height = 7500, res = 600)

 distribution <- ggplot(var_density, aes(x=region, y=variation_count))

 distribution + geom_line(colour = "darkorange4", size = 0.35) +
   guides(fill=FALSE) +
   theme(axis.line = element_line(size=0.5),
         panel.background = element_rect(fill="white",size=rel(20)),
         panel.grid.minor = element_line(colour = "gray"),
         axis.text = element_text(family="Times New Roman",size=16,face="bold"),
         axis.title = element_text(family="Times New Roman",size=18,face="bold"),
         strip.text = element_text(family="Times New Roman",size=12,face="bold")) +
   facet_wrap(~chrom, ncol=2)

 dev.off()

作者:生物数据分析笔记
链接:https://www.jianshu.com/p/9b5541a3f67c
来源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容