文件下载
下载misa,同时将misa.ini放在misa的同一个文件夹下,然后下载三个perl脚本get_set_trimmer.pl、p3_in.pl和p3_out.pl,建议下载到同一个文件夹下。
介绍
misa.ini :配置文件
p3_in.pl:输入 misa.pl 的输出结果(file.fasta.misa),将引物设计的参数文件(模板,产物长度,目标区域等)导入到一个以“p3in”为后缀的文件中。
get_est_trimmer.pl:针对EST序列,可以除去EST序列中短的序列和两端不明确的碱基。
p3_out.pl:对primer3产生的文件进行提取合,得到最后的结果文件 filename.result。
- 这里以玉米1号染色体的序列为例进行演示,玉米基因组的fa文件可以在ensembl网站上下载
perl misa.pl Zea_mays.AGPv4.dna.chromosome.1.fa
生成的文件说明
Zea_mays.AGPv4.dna.chromosome.1.fa.misa:以表格的形式列出微卫星的类型和位点;
Zea_mays.AGPv4.dna.chromosome.1.fa.statistics:统计微卫星的类型和频数。
因为如果直接使用p3_in.pl进行转换生成的文件会比较大,所以下面多了几步#提取misa文件中的染色体编号和开始,结束的位置,两边各延伸150bp,生成一个bed文件。
cat Zea_mays.AGPv4.dna.chromosome.1.fa.misa |awk 'NR>1 {print $1"\t"$6-150"\t"$7+150}' >Zea_mays.AGPv4.dna.chromosome.1_ssr.bed
#使用bedtools工具提取重复序列
bedtools getfasta -fi Zea_mays.AGPv4.dna.chromosome.1.fa -bed Zea_mays.AGPv4.dna.chromosome.1_ssr.bed -fo Zea_mays.AGPv4.dna.chromosome.1_ssr.fa
再进行一次misa查找一次
perl misa.pl Zea_mays.AGPv4.dna.chromosome.1_ssr.fa
比较一下这两次的结果可以知道我们做了什么,需要提醒的是可以有多种方法达到这样的结果
- 接下来就是修改p3_in.pl文件,这样使用它生成的文件就可以直接在primer3上面运行了,修改的内容可以参考primer3文件下的example文件,将p3_in.pl文件的输出内容和example的内容一致,我现在使用的版本的修改内容是:
print OUT "PRIMER_SEQUENCE_ID=$id"."_$ssr_nr\nSEQUENCE=$seq\n";
改为
print OUT "SEQUENCE_ID=$id"."_$ssr_nr\nSEQUENCE_TEMPLATE=$seq\n";
调用p3_in.pl
perl p3_in.pl Zea_mays.AGPv4.dna.chromosome.1_ssr.fa.misa
#然后使用primer3进行设计引物
~/software/primer3-2.4.0/src/primer3_core --default_version=1 -- output=Zea_mays.AGPv4.dna.chromosome.1_ssr.fa.p3out Zea_mays.AGPv4.dna.chromosome.1_ssr.fa.p3in
- 使用p3_out.pl对设计好的引物进行处理,生成自然阅读的格式,但是需要对这个脚本进行处理,修改的方式和上面修改的方式类似,就是按照primer3生成的文件来修改#首先就是将输出的编号进行修改,因为用primer3生成的引物是从0开始的,而且可能有多对引物,所以得增加输出的列,将下列文件
print OUT "ID\tSSR nr.\tSSR type\tSSR\tsize\tstart\tend\t";
print OUT "FORWARD PRIMER1 (5'-3')\tTm(癈)\tsize\tREVERSE PRIMER1 (5'-3')\tTm(癈)\tsize\tPRODUCT1 size (bp)\tstart (bp)\tend (bp)\t";
print OUT "FORWARD PRIMER2 (5'-3')\tTm(癈)\tsize\tREVERSE PRIMER2 (5'-3')\tTm(癈)\tsize\tPRODUCT2 size (bp)\tstart (bp)\tend (bp)\t";
print OUT "FORWARD PRIMER3 (5'-3')\tTm(癈)\tsize\tREVERSE PRIMER3 (5'-3')\tTm(癈)\tsize\tPRODUCT3 size (bp)\tstart (bp)\tend (bp)\n";
改为
print OUT "ID\tSSR nr.\tSSR type\tSSR\tsize\tstart\tend\t";
print OUT "FORWARD PRIMER0 (5'-3')\tTm(癈)\tsize\tREVERSE PRIMER0 (5'-3')\tTm(癈)\tsize\tPRODUCT0 size (bp)\tstart (bp)\tend (bp)\t";
print OUT "FORWARD PRIMER1 (5'-3')\tTm(癈)\tsize\tREVERSE PRIMER1 (5'-3')\tTm(癈)\tsize\tPRODUCT1 size (bp)\tstart (bp)\tend (bp)\t";
print OUT "FORWARD PRIMER2 (5'-3')\tTm(癈)\tsize\tREVERSE PRIMER2 (5'-3')\tTm(癈)\tsize\tPRODUCT2 size (bp)\tstart (bp)\tend (bp)\t";
print OUT "FORWARD PRIMER3 (5'-3')\tTm(癈)\tsize\tREVERSE PRIMER3 (5'-3')\tTm(癈)\tsize\tPRODUCT3 size (bp)\tstart (bp)\tend (bp)\t";
print OUT "FORWARD PRIMER4 (5'-3')\tTm(癈)\tsize\tREVERSE PRIMER4 (5'-3')\tTm(癈)\tsize\tPRODUCT4 size (bp)\tstart (bp)\tend (bp)\t";
- 然后就是,将提取内容的代码进行修改,就是将前面几行中没有0的地方加上0,再在后面加上几次重复
/PRIMER_LEFT_SEQUENCE=(.*)/ || do {$count_failed++;print OUT "$misa\n"; next}; my $info = "$1\t";
/PRIMER_LEFT_TM=(.*)/; $info .= "$1\t";
/PRIMER_LEFT=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_RIGHT_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_TM=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_PRODUCT_SIZE=(.*)/; $info .= "$1\t";
/PRIMER_LEFT=(\d+),\d+/; $info .= "$1\t";
/PRIMER_RIGHT=(\d+),\d+/; $info .= "$1\t";
/PRIMER_LEFT_1_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_1_TM=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_1=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_RIGHT_1_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_1_TM=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_1=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_PRODUCT_SIZE_1=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_1=(\d+),\d+/; $info .= "$1\t";
/PRIMER_RIGHT_1=(\d+),\d+/; $info .= "$1\t";
/PRIMER_LEFT_2_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_2_TM=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_2=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_RIGHT_2_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_2_TM=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_2=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_PRODUCT_SIZE_2=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_2=(\d+),\d+/; $info .= "$1\t";
/PRIMER_RIGHT_2=(\d+),\d+/; $info .= "$1";
改为
/PRIMER_LEFT_0_SEQUENCE=(.*)/ || do {$count_failed++;print OUT "$misa\n"; next}; my $info = "$1\t";
/PRIMER_LEFT_0_TM=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_0=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_RIGHT_0_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_0_TM=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_0=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_PRODUCT_SIZE_0=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_0=(\d+),\d+/; $info .= "$1\t";
/PRIMER_RIGHT_0=(\d+),\d+/; $info .= "$1\t";
/PRIMER_LEFT_1_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_1_TM=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_1=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_RIGHT_1_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_1_TM=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_1=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_PRODUCT_SIZE_1=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_1=(\d+),\d+/; $info .= "$1\t";
/PRIMER_RIGHT_1=(\d+),\d+/; $info .= "$1\t";
/PRIMER_LEFT_2_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_2_TM=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_2=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_RIGHT_2_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_2_TM=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_2=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_PRODUCT_SIZE_2=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_2=(\d+),\d+/; $info .= "$1\t";
/PRIMER_RIGHT_2=(\d+),\d+/; $info .= "$1";
/PRIMER_LEFT_3_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_3_TM=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_3=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_RIGHT_3_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_3_TM=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_3=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_PRODUCT_SIZE_3=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_3=(\d+),\d+/; $info .= "$1\t";
/PRIMER_RIGHT_3=(\d+),\d+/; $info .= "$1";
/PRIMER_LEFT_4_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_4_TM=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_4=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_RIGHT_4_SEQUENCE=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_4_TM=(.*)/; $info .= "$1\t";
/PRIMER_RIGHT_4=\d+,(\d+)/; $info .= "$1\t";
/PRIMER_PRODUCT_SIZE_4=(.*)/; $info .= "$1\t";
/PRIMER_LEFT_4=(\d+),\d+/; $info .= "$1\t";`
/PRIMER_RIGHT_4=(\d+),\d+/; $info .= "$1";
- 最后运行p3_out.pl脚本即可
perl p3_out.pl Zea_mays.AGPv4.dna.chromosome.1_ssr.fa.p3out Zea_mays.AGPv4.dna.chromosome.1_ssr.fa.misa