SyRI(Synteny and Rearrangement Identifier),是一款用于基因组结构变异检测及可视化的软件。
SyRI首先查找重排区域,然后搜索序列中的差异,鉴别它们是否位于同位或重排区域。
需要软件及环境
minimap2
nucmer
python 3.8
经过我不断探索后发现,直接运行/syri/syri/example/文件夹里的pipline.sh 可以快速配置环境。
我的运行目录是conda/base 环境
运行脚本
syri=/public/home/lianglunping/liangtmp/syri/syri/syri/bin/syri
ln -sf /public/home/fengting/task/5.12ragtag/dataR498/H7L1.fa refgenome
ln -sf /public/home/fengting/task/5.12ragtag/dataR498/H7L26.fa qrygenome
minimap2 -ax asm5 --eqx refgenome qrygenome > out.sam
python3 $syri -c out.sam -r refgenome -q qrygenome -k -F S
nucmer --maxmatch -c 100 -b 500 -l 50 refgenome qrygenome # Whole genome alignment. Any other alignment can also be used.
delta-filter -m -i 90 -l 100 out.delta > out.filtered.delta # Remove small and lower quality alignments
show-coords -THrd out.filtered.delta > out.filtered.coords # Convert alignment information to a .TSV format as required by SyRI
python3 $syri -c out.filtered.coords -d out.filtered.delta -r refgenome -q qrygenome
/public/home/lianglunping/liangtmp/syri/syri/syri/bin/plotsr syri.out refgenome qrygenome ###画图
软件运行比较慢,如果想将染色体拆分开,可以运行一下代码:
perl seq.perl chrlist Input > Output
###seq.perl
###useage:perl seq.perl chrlist Input > Output
#! /usr/bin/perl -w
use strict;
die "perl $0 <lst><fa>\n" unless @ARGV==2;
my ($lst,$fa)=@ARGV;
open IN,$lst||die;
my %ha;
map{chomp;$ha{(split)[0]}=1}<IN>;
close IN;
$fa=~/gz$/?(open IN,"gzip -cd $fa|"||die):(open IN,$fa||die);
$/=">";<IN>;$/="\n";
my %out;
while(<IN>){
my $info=$1 if(/^(\S+)/);
$/=">";
my $seq=<IN>;
$/="\n";
$seq=~s/>|\r|\*//g;
print ">$info\n$seq" if(exists $ha{$info} && ! exists $out{$info});
$out{$info}=1;
}
close IN;
###plot.R
####绘制染色体共线性关系
###直接运行Rscript plot.R
df <- read.table('out.filtered.coords',sep='\t')
colnames(df) <- c("ref_start", "ref_end", "qry_start", "qry_end", "ref_len", "qry_len",
"identiy", "ref_tag","qry_tag" )
x_range <- range(c(df$ref_start, df$ref_end))
y_range <- range(c(df$qry_start, df$qry_end))
pdf('1.pdf')
plot.new()
plot.window(xlim = x_range,
ylim = y_range)
for( i in 1:nrow(df)){
if (df[i,3] < df[i,4]){
lines(x = df[i,1:2], y = df[i,3:4], col = "red")
} else{
lines(x = df[i,1:2], y = df[i,3:4], col = "blue")
}
}
box()
axis(1, at = seq(0, x_range[2], 10000), labels = seq(0, x_range[2], 10000) / 10000)
axis(2, at = seq(0, y_range[2], 10000), labels = seq(0, y_range[2], 10000) / 10000)
dev.off()
参考链接:
SyRI:一款从组装的基因组中检测结构变异的实用软件 - 简书 (jianshu.com)
Pre-requisite for installing SyRI | syri (schneebergerlab.github.io)
Jiang L, Lin M, Wang H, Song H, Zhang L, Huang Q, Chen R, Song C, Li G, Cao Y. Haplotype-resolved genome assembly of Bletilla striata (Thunb.) Reichb.f. to elucidate medicinal value. Plant J. 2022 Sep;111(5):1340-1353. doi: 10.1111/tpj.15892. Epub 2022 Jul 29. PMID: 35785503.