⭐⭐⭐本文记录我使用PsRobot的psRobot_tar模块识别靶基因的过程。踩了不少坑,供实验室师弟师妹们借鉴学习。本文参考:
psRobot_tar
模块 is designed to find potential small RNA targets;
psRobot_tar 识别潜在的小RNA 的靶基因。psRobot_map
模块 is designed to find all perfect matching locations of short sequences (less than 40bp) in longer reference sequences;
psRobot_map 在更长的参考序列上找出所有完美匹配的短序列(小于40bp)。psRobot_mir
模块 is designed to find small RNAs with stem-loop precursors (e.g. miRNAs or shRNAs) for a batch of input sequences from high throughput sequencing data;
psRobot_mir 可为一批来自高通量的输入序列寻找具有茎环前体的小RNA(如miRNA或shRNA)。psRobot_deg
模块 is designed to identify which small RNA targets are supported by user specified degradome data.
psRobot_deg 用于识别哪些小RNA靶标得到了用户指定的降解组数据的支持。
下面我们借助psRobot_tar模块识别miRNA的靶基因,let's go。
1. 下载、处理mature.fa文件
- 从mirBase下载,
mature.fa
文件
👉 注意:最好迅雷下载,不知道为何,浏览器直接下载,下载不了。
- 从mirBase下载,
- 使用Notepad++ 软件处理,删除其他物种的miRNA,仅保留小麦的,另存为
tae_miR.fa
.
- 使用Notepad++ 软件处理,删除其他物种的miRNA,仅保留小麦的,另存为
2. 从Ensembl plants 下载cDNA文件。
Triticum_aestivum.IWGSC.cdna.all.fa
3. 使用xftp 上传至服务器
- Triticum_aestivum.IWGSC.cdna.all.fa
- tae_miR.fa
4. 简化Triticum_aestivum.IWGSC.cdna.all.fa 和tae_miR.fa文件
删除以">"开始的行中cdna 及以后的信息
sed -ri '/>/s/cdna.*$/ /g' Triticum_aestivum.IWGSC.cdna.all.fa
简化tae_miR.fa
没处理之前的tae_miR.fa
less -SN tae_miR.fa
=================== 没处理之前 =========================
>tae-miR159a MIMAT0005343 Triticum aestivum miR159a
UUUGGAUUGAAGGGAGCUCUG
>tae-miR159b MIMAT0005344 Triticum aestivum miR159b
UUUGGAUUGAAGGGAGCUCUG
>tae-miR160 MIMAT0005345 Triticum aestivum miR160
UGCCUGGCUCCCUGUAUGCCA
>tae-miR164 MIMAT0005346 Triticum aestivum miR164
UGGAGAAGCAGGGCACGUGCA
=================== 没处理之前 =========================
处理tae_miR.fa,变得清爽多了
sed -ri '/>/s/MIMAT.*$//g' tae_miR.fa
less -SN tae_miR.fa
=================== 处理之后 ===========================
>tae-miR159a
UUUGGAUUGAAGGGAGCUCUG
>tae-miR159b
UUUGGAUUGAAGGGAGCUCUG
>tae-miR160
=================== 处理之后 ===========================
5. 依赖软件mfold3.5 安装 (有管理员权限)
wget http://omicslab.genetics.ac.cn/psRobot/program/WebServer/mfold.tar.gz
tar xvzf mfold.tar.gz
cd mfold-3.5/
./configure
make
sudo make install
6. PsRobot软件 安装 (有管理员权限)
wget http://omicslab.genetics.ac.cn/psRobot/program/WebServer/psRobot_v1.2.tar.gz
tar xvzf psRobot_v1.2.tar.gz
cd psRobot_v1.2
sudo ./configure
make
sudo make install
source ~/.bashrc
7. PsRobot运行
PsRobot 有一些参数:
psRobot_tar -s tae_miR.fa -t Triticum_aestivum.IWGSC.cdna.all.fa -p 8 -o target_results.gTP
使用cDNA序列不用genomic序列的原因是,miRNA在细胞质和靶基因结合发挥作用。此时靶基因还有UTR区域但是已经没有内含子区了。(考虑到UTR区域的序列特点,其实用CDS序列也行)
psRobot_tar 的参数:
-s
input file name: smRNA sequences (fasta format);default = smRNA
待预测的miRNA,fasta格式;默认:smRNA-t
input file name: target sequences (fasta format),default = target
用于搜索的cDNA序列,fasta格式;默认: target-o
output file name,👉注意:default = smRNA-target.gTP
输出文件名,默认:smRNA-target.gTP-ts
target penalty score, lower is better (0-5),default = 2.5
输出结果的阈值,默认:2.5-fp
5 prime boundary of essential sequence (1-2),default = 2
5'后第几位开始是必要区间(1~2), 默认:2-tp
3 prime boundary of essential sequence (7-31), default = 17
3'后第几位开始是必要区间(7~31), 默认:17-gl
position after which with gap/bulge permit (0-30), 0 means no gap/bulge permitted, default = 17
从第几个碱基后允许出现gap/bulge, 默认:17-p
number of processors use,default = 1,
使用线程数, 默认:1,👉注意:根据实际情况可以改大些-gn
number of gaps/bulges permitted (0-5), default = 1
允许存在几个gap/bulge, 默认:1
8. 结果查看
less -SN target_results.gTP
======================================================
1 >tae-miR159a Score: 2.5 TraesCS7A02G377100.1
2
3 Query: 1 TTTGGATTGAAGGGAGCTCTG^M 22
4 *|||||*||||||||||||::*
5 Sbjct: 1095 TAACCTTACTTCCCTCGAGGTA 1074
6
7
8 >tae-miR159a Score: 2.5 TraesCS7D02G446700.1
9
10 Query: 1 TTTGGATTGAAGGGAGCTCTG^M 22
11 |||||*|:|||||||||||*|*
12 Sbjct: 952 AAACCAAGCTTCCCTCGAG-CG 932
13
14
15 >tae-miR159a Score: 2.5 TraesCS1D02G307500.2
16
17 Query: 1 TTTGGATTGAAGGGAGCTCTG^M 22
18 *|||||*||||||||||||::*
19 Sbjct: 1156 TAACCTTACTTCCCTCGAGGTA 1135
======================================================
9. 将靶基因对存于miRNA-mRNA.txt 文件
cat target_results | grep "^>" | cut -f 1,3 | sed 's/>//g' >>miRNA_mRNA.txt