需要看下Nip和9311的基因匹配的有多少,思来想去,blast还是有点缺陷的,可能匹配到其他染色i他,需要用gff来帮助定一下位。所以,用基因的共线性分析看起来更合适一点。
- 调查,mcscanx,jcvi两个。jcvi貌似更新一些,我也常用python,功能上可能更容易上手。
- 安装,jcvi,
利用python3.6安装jcvi,不指定python版本安装的是2.7,后面出问题了
conda create -n jcvi -c bioconda last jcvi python=3.6
换用docker,用docker的出图了,我认为这个更靠谱。但是docker文件夹得在本地。
我是win装linux虚拟机,然后吧文件夹挂载到raid服务器,但是文件格式问题搞了很久,最后还是重新回复了docker的文件位置,虚拟机本地硬盘。
docker网站下载一个带tag的jcvi 3.10的
docker pull quay.io/biocontainers/jcvi:1.4.15--py310hd6be1da_0
下载好之后,进入docker 镜像
准备文件
- cds文件
2.gff文件或bed文件,没了
放到一个文件夹。
如果是gff,转化成bed
#gff convert to bed
python -m jcvi.formats.gff bed --type=mRNA --key=transcript_id Oryza_sativa.IRGSP-1.0.48.gff3 -o Nip.bed
python -m jcvi.formats.gff bed --type=mRNA --key=transcript_id Oryza_indica.ASM465v1.48.chr.gff3 -o 9311.bed
这个地方如果bed文件种第
分析共线性区域,这条命令会生成一堆东西,包括last比对结果。
#identify blocks
python -m jcvi.compara.catalog ortholog Nip 9311 --no_strip_names
# get synteny,生成simple文件
python -m jcvi.compara.synteny screen --minspan=30 --simple 9311.Nip.anchors 9311.Nip.anchors.new
##prepare for seqid and layout file
# plot synteny
python -m jcvi.graphics.karyotype seqid layout
最后结果:
python -m jcvi.compara.synteny depth --histogram Nip.9311.anchors
python -m jcvi.compara.synteny depth --histogram Nip.9311.lifted.anchors
.simple文件中的每一行都是一个连锁块,有起始和终止Nip基因,然后是起始和终止9311基因,最后两列是分数和方向。
Os01t0100100-01 Os01t0978100-01 BGIOSGA002569-TA BGIOSGA000001-TA 5761 +
Os10t0100500-01 Os10t0162856-00 BGIOSGA032412-TA BGIOSGA032250-TA 358 +
Os10t0164500-00 Os10t0579600-01 BGIOSGA032579-TA BGIOSGA033506-TA 1931 +
Os11t0106400-01 Os11t0194100-01 BGIOSGA034647-TA BGIOSGA034978-TA 630 +
Os11t0205500-02 Os11t0256400-01 BGIOSGA034343-TA BGIOSGA034190-TA 284 +
Os11t0264200-01 Os11t0312400-01 BGIOSGA035085-TA BGIOSGA034092-TA 191 +
Os11t0427500-01 Os11t0437800-01 BGIOSGA034091-TA BGIOSGA034059-TA 54 +
Os11t0453600-00 Os11t0586300-01 BGIOSGA034056-TA BGIOSGA033741-TA 612 +
Os11t0594500-00 Os11t0604700-00 BGIOSGA033738-TA BGIOSGA035559-TA 73 -
Os11t0615600-01 Os11t0622800-00 BGIOSGA035610-TA BGIOSGA035636-TA 56 +
Os11t0625900-01 Os11t0640600-00 BGIOSGA035572-TA BGIOSGA033646-TA 71 +
Os11t0653300-01 Os11t0668300-00 BGIOSGA033606-TA BGIOSGA033577-TA 72 +
Os11t0672700-01 Os11t0687200-03 BGIOSGA033576-TA BGIOSGA033542-TA 68 -
Os11t0691100-00 Os11t0707700-01 BGIOSGA033541-TA BGIOSGA033507-TA 90 +
Os11t0100800-01 Os11t0152500-01 BGIOSGA036827-TA BGIOSGA036625-TA 400 +
Os12t0106000-01 Os12t0145700-01 BGIOSGA034647-TA BGIOSGA034500-TA 318 +
Os12t0100700-00 Os12t0257900-01 BGIOSGA036827-TA BGIOSGA037229-TA 917 +
Os12t0277400-01 Os12t0641500-01 BGIOSGA036385-TA BGIOSGA035747-TA 1352 +
Os12t0258200-01 Os12t0277000-01 BGIOSGA018541-TA BGIOSGA018489-TA 85 +
Os02t0100100-01 Os02t0450000-01 BGIOSGA007319-TA BGIOSGA008132-TA 1735 +
Os02t0453500-01 Os02t0834300-01 BGIOSGA006534-TA BGIOSGA009360-TA 2774 +
Os03t0100010-00 Os03t0862100-01 BGIOSGA011641-TA BGIOSGA014002-TA 5047 +
Os04t0225100-01 Os04t0234600-01 BGIOSGA009378-TA BGIOSGA009362-TA 33 +
Os04t0100300-00 Os04t0126600-00 BGIOSGA015745-TA BGIOSGA015685-TA 137 +
Os04t0127200-02 Os04t0169666-00 BGIOSGA015814-TA BGIOSGA015606-TA 147 +
Os04t0171800-01 Os04t0198733-00 BGIOSGA015538-TA BGIOSGA015997-TA 108 -
Os04t0201200-01 Os04t0218600-02 BGIOSGA015603-TA BGIOSGA015929-TA 73 +
Os04t0243700-01 Os04t0693400-00 BGIOSGA015490-TA BGIOSGA014021-TA 3131 +
Os05t0100100-01 Os05t0228400-01 BGIOSGA018993-TA BGIOSGA018542-TA 926 +
Os05t0229475-00 Os05t0326625-00 BGIOSGA018488-TA BGIOSGA018294-TA 398 +
Os05t0330700-02 Os05t0597700-01 BGIOSGA018291-TA BGIOSGA017428-TA 1924 +
Os06t0101000-01 Os06t0196900-02 BGIOSGA022119-TA BGIOSGA022469-TA 813 +
Os06t0198500-00 Os06t0731700-01 BGIOSGA021720-TA BGIOSGA020493-TA 2579 +
Os07t0100200-01 Os07t0695400-01 BGIOSGA024990-TA BGIOSGA023660-TA 3055 +
Os08t0100150-01 Os08t0567200-01 BGIOSGA027824-TA BGIOSGA026457-TA 2898 +
Os09t0101100-01 Os09t0287500-01 BGIOSGA030257-TA BGIOSGA030063-TA 410 +
Os09t0293400-01 Os09t0573200-01 BGIOSGA030459-TA BGIOSGA031320-TA 1853 +
还有很多绘图样式,参考
https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)
20241203
对bed文件用了uniq之后,会出现liftover文件,里面应该是删除掉的存在overlap的基因。
9311.bed 1.46 MB 2024/12/03 19:39 -a--
9311.leftover.bed 10.74 KB 2024/12/03 16:25 -a--
9311.uniq.bed 1.45 MB 2024/12/03 16:25 -a--
Nip.bed 1.68 MB 2024/12/03 16:21 -a--
Nip.leftover.bed 487.24 KB 2024/12/03 16:25 -a--
Nip.uniq.bed 1.20 MB 2024/12/03 16:25 -a--
换个参数:
加上只保留一个转录本的参数,最长的
python -m jcvi.formats.gff bed --type=mRNA --key=transcript_id Oryza_sativa.IRGSP-1.0.48.gff3 -o Nip_p.bed --primary_only
[12/03/24 11:36:24] DEBUG Extracted 37960 features (type=mRNA id=transcript_id parent=Parent) gff.py:3102
DEBUG Skipped non-primary: 6794 gff.py:3110
python -m jcvi.formats.gff bed --type=mRNA --key=transcript_id Oryza_indica.ASM465v1.48.chr.gff3 -o 9311.bed --primary_only
[12/03/24 11:38:13] DEBUG Extracted 37878 features (type=mRNA id=transcript_id parent=Parent) gff.py:3102
DEBUG Skipped non-primary: 0 gff.py:3110
[12/03/24 11:58:55] DEBUG running the cscore filter (cscore>=0.50) .. blastfilter.py:107
DEBUG after filter (643191->41220) .. blastfilter.py:109
DEBUG running the local dups filter (tandem_Nmax=10) .. blastfilter.py:114
[12/03/24 11:58:56] DEBUG after filter (41220->31911) .. blastfilter.py:154
DEBUG Assuming --qbed=Nip_p.bed --sbed=9311_p.bed synteny.py:390
DEBUG Load file `Nip_p.bed` base.py:36
[12/03/24 11:58:57] DEBUG Load file `9311_p.bed` base.py:36
DEBUG Load file `Nip_p.9311_p.last.filtered` base.py:36
DEBUG A total of 31911 BLAST imported from `Nip_p.9311_p.last.filtered`. synteny.py:277
DEBUG Chaining distance = 20 synteny.py:1794
[12/03/24 11:58:59] DEBUG Load file `Nip_p.9311_p.anchors` base.py:36
A total of 25410 (NR:25138) anchors found in 191 clusters.
Stats: Min=4 Max=3551 N=191 Mean=133.04 SD=467.77 Median=5.0 Sum=25410
NR stats: Min=4 Max=3516 N=191 Mean=131.61 SD=463.94 Median=5.0 Sum=25138
[12/03/24 11:59:00] DEBUG Assuming --qbed=Nip_p.bed --sbed=9311_p.bed synteny.py:390
DEBUG Load file `Nip_p.bed` base.py:36
DEBUG Load file `9311_p.bed` base.py:36
DEBUG Load file `Nip_p.9311_p.anchors` base.py:36
[12/03/24 11:59:01] INFO Number of variables (191), number of constraints (309) lpsolve.py:135
DEBUG `/home/jing/fsn1_shared_dir/jcvi_data/cscore0.7-onlyprimary/work` not found. Creating new. base.py:1320
DEBUG Write MIP instance to `/home/jing/fsn1_shared_dir/jcvi_data/cscore0.7-onlyprimary/work/data.lp` lpsolve.py:198
DEBUG scip -f /home/jing/fsn1_shared_dir/jcvi_data/cscore0.7-onlyprimary/work/data.lp -l base.py:1191
/home/jing/fsn1_shared_dir/jcvi_data/cscore0.7-onlyprimary/work/data.lp.out >/dev/null
/bin/bash: line 1: scip: command not found
ERROR You need to install program `scip` [http://scip.zib.de/] lpsolve.py:316
ERROR SCIP fails... trying GLPK lpsolve.py:176
DEBUG Write MIP instance to `/home/jing/fsn1_shared_dir/jcvi_data/cscore0.7-onlyprimary/work/data.lp` lpsolve.py:198
DEBUG glpsol --cuts --fpump --lp /home/jing/fsn1_shared_dir/jcvi_data/cscore0.7-onlyprimary/work/data.lp -o base.py:1191
/home/jing/fsn1_shared_dir/jcvi_data/cscore0.7-onlyprimary/work/data.lp.out -w
/home/jing/fsn1_shared_dir/jcvi_data/cscore0.7-onlyprimary/work/data.lp.list >/dev/null
/bin/bash: line 1: glpsol: command not found
ERROR You need to install program `glpsol` [http://www.gnu.org/software/glpk/] lpsolve.py:243
DEBUG Selected 0 blocks quota.py:267
DEBUG Screened blocks ids written to `Nip_p.9311_p.1x1` quota.py:274
DEBUG Load file `Nip_p.9311_p.anchors` base.py:36
DEBUG Load file `Nip_p.9311_p.1x1` base.py:36
DEBUG Load file `Nip_p.bed` base.py:36
DEBUG Load file `9311_p.bed` base.py:36
[12/03/24 11:59:02] DEBUG Before: 191 blocks, After: 191 blocks synteny.py:1349
DEBUG Assuming --qbed=Nip_p.bed --sbed=9311_p.bed synteny.py:390
DEBUG Load file `Nip_p.bed` base.py:36
DEBUG Load file `9311_p.bed` base.py:36
[12/03/24 11:59:03] DEBUG Load file `Nip_p.9311_p.last` base.py:36
[12/03/24 11:59:09] DEBUG A total of 643191 BLAST imported from `Nip_p.9311_p.last`. synteny.py:277
[12/03/24 11:59:10] DEBUG Load file `Nip_p.9311_p.1x1.anchors` base.py:36
[12/03/24 11:59:11] DEBUG A total of 25410 anchors imported. synteny.py:303
[12/03/24 11:59:12] DEBUG 39983 new pairs found (dist=20). synteny.py:1872
DEBUG Removed 31 existing anchors base.py:72
DEBUG Corrected scores for 1643 anchors base.py:73
DEBUG Anchors written to `Nip_p.9311_p.1x1.lifted.anchors` base.py:74
DEBUG Load file `Nip_p.9311_p.1x1.lifted.anchors` base.py:36
A total of 65362 (NR:31837) anchors found in 191 clusters.
Stats: Min=4 Max=8636 N=191 Mean=342.21 SD=1132.18 Median=22.0 Sum=65362
NR stats: Min=4 Max=4245 N=191 Mean=166.69 SD=554.34 Median=15.0 Sum=31837
DEBUG Load file `Nip_p.bed` base.py:36
[12/03/24 11:59:13] DEBUG Load file `Nip_p.9311_p.1x1.lifted.anchors` base.py:36
Chain started: 191 blocks
Chain 0: score=30596 152 blocks remained..
DEBUG MCscan blocks written to `Nip_p.9311_p.1x1.blocks`. synteny.py:1557
[12/03/24 11:59:14] DEBUG Load file `9311_p.bed` base.py:36
DEBUG Load file `Nip_p.9311_p.1x1.lifted.anchors` base.py:36
Chain started: 191 blocks
Chain 0: score=28686 157 blocks remained..
[12/03/24 11:59:15] DEBUG MCscan blocks written to `9311_p.Nip_p.1x1.blocks`. synteny.py:1557
DEBUG Load file `Nip_p.9311_p.last` base.py:36
DEBUG Register best scores .. blast.py:809
[12/03/24 11:59:21] DEBUG Load file `Nip_p.9311_p.last` base.py:36
[12/03/24 11:59:25] DEBUG Load file `Nip_p.9311_p.rbh` base.py:36
DEBUG Imported 27286 records from `Nip_p.9311_p.rbh` base.py:101
DEBUG Load file `Nip_p.9311_p.rbh` base.py:36
DEBUG Imported 25252 records from `Nip_p.9311_p.rbh` base.py:101
DEBUG Recruited 1124 pairs from RBH. catalog.py:592
DEBUG Load file `Nip_p.9311_p.rbh` base.py:36
DEBUG Imported 27286 records from `Nip_p.9311_p.rbh` base.py:101
DEBUG Load file `Nip_p.9311_p.rbh` base.py:36
DEBUG Imported 25252 records from `Nip_p.9311_p.rbh` base.py:101
DEBUG Recruited 1529 pairs from RBH.
总结:结果中文件解读
9311_p.Nip_p.ortholog 算作同源基因查找,还不能算作block,mt的基因是可以匹配到核基因的。不能用
Nip_p.9311_p.ortholog
Nip_p.9311_p.rbh 没有单独的某基因没匹配的结果,不适合我用
9311_p.Nip_p.1x1.blocks 这个好用,基因1vs1的匹配,没匹配到的也给出来。
Nip_p.9311_p.1x1.blocks
Nip_p.9311_p.1x1.lifted.anchors
Nip_p.9311_p.1x1.anchors
Nip_p.9311_p.1x1
Nip_p.9311_p.anchors