igblastn使用

简介:
IgBlast是NCBI设计开发的一种专一的blast工具,特定用于比对抗体( immunoglobulin ,IG)或T细胞受体( T cell receptor,TR)序列。
为什么要专门开发一个比对软件呢?他跟blastn等有什么区别,这得从IG和TR的结构说起,

IG和TR的结构类似,都是由2条轻链和2条重链构成,每条链可以分为可变区(variable domain)和恒定区(constant domain)。可变区还可以进一步分为骨架区(FR)和互补作用区(CDR)。
IG或TR识别抗原的关键在于可变区的高度可变性,这种可变性是由「基因重排」机制产生。

1,下载igblast程序和其他必要文件:
下载地址:https://ftp.ncbi.nih.gov/blast/executables/igblast/release/LATEST
从下面地址下载 internal_data 和 optional_file 整个目录,这两个目录也是必需的: https://ftp.ncbi.nih.gov/blast/executables/igblast/release/

2,下载IMGT数据库,或者TCR数据库
/database/IMGT_V-QUEST_reference_directory/Homo_sapiens/TR

3,igblastn具体使用例子:
igblast数据比对的input文件必须是fasta格式。所以如果你的数据是fastq格式,需要先转一下格式(seqkit fq2fa)

igblastn -query test.fa \
  -out test.out \
  -outfmt 7 \
  -germline_db_V ./database/human_TR_V \
  -germline_db_J ./database/human_TR_J \
  -germline_db_D ./database/human_TR_D \
  -auxiliary_data ./optional_file/human_gl.aux  \
  -organism human \
  -domain_system imgt  \
  -ig_seqtype TCR \
  -num_threads 4 \
  -min_D_match 5  \
  -num_alignments_V 3 \ 
  -num_alignments_D 3 \
  -num_alignments_J 3

发现了一个新的软件:igblastwrapper
https://github.com/mikessh/migmap
Motivation
IgBlast is an excellent of V-(D)-J mapping tool able to correctly map even severely hypermutated antibody variants. While being a gold standard, the following limitations of IgBlast v1.4.0 have driven MIGMAP development:

  • It doesn't extract sequence of CDR3 region directly, neither provide coordinates for CDR3 region in reads. It reports reference Cys residue of Variable segment and Variable segment end in CDR3, but not Phe/Trp residue of J segment that marks the end of CDR3

  • Output is not straightforward to parse and summarize to a readable clonotype abundance table containing CDR3 sequences, segment assignments and list of somatic hypermutations

  • It doesn't account for sequence quality

  • It is somewhat hard to make it running with a custom segment reference and species other than human and mouse

# 查看帮助文档

java -jar /software/biosoftware/migmap-1.0.3/migmap-1.0.3.jar -h
java -jar /software/biosoftware/migmap-1.0.3/migmap-1.0.3.jar -R TRA,TRB -S human --use-kabat -q 70 ../02.align/RP01G9E1L3_R1.fq.gz migmap.out.txt
java -jar /software/biosoftware/migmap-1.0.3/migmap-1.0.3.jar -R TRA,TRB -S human --use-kabat -q 70 --illegal-access ../02.align/RP01G9E1L3_R1.fq.gz migmap.out.txt
java -jar /software/biosoftware/migmap-1.0.3/migmap-1.0.3.jar -R TRA,TRB -S human --by-read --data-dir ./ --use-kabat --allow-incomplete --details  ../02.align/RP01G9E1L3_R1.fq.gz migmap.out.txt
都没有成功,先试一下测试文件吧
cd /software/biosoftware/migmap-1.0.3
./migmap -R TRA,TRB -S human ./test/sample.fasta.gz migmap.out.txt
成功
回到工作目录:
不再使用,忽略掉。

发现 igblastn网页版和本地版不一致情况

igblastn -query test.igblast.fa \
-out test.fmt.out -outfmt 7 \
-germline_db_V ./database/human_TR_V \
-germline_db_J ./database/human_TR_J \
-germline_db_D ./database/human_TR_D \
-auxiliary_data ./optional_file/human_gl.aux \
-organism human \
-domain_system imgt \
-ig_seqtype TCR  \
-min_D_match 5  \
-num_alignments_V 3 \
-num_alignments_D 3 \
-num_alignments_J 3

本地版参数:

igblastn -query test.fa \
  -out test.out \
  -outfmt 7 \
  -germline_db_V ./database/human_TR_V \
  -germline_db_J ./database/human_TR_J \
  -germline_db_D ./database/human_TR_D \
  -auxiliary_data ./optional_file/human_gl.aux  \
  -organism human \
  -domain_system imgt  \
  -ig_seqtype TCR \
  -num_threads 4 \
  -min_D_match 5  \
  -num_alignments_V 3 \
  -num_alignments_D 3 \
  -num_alignments_J 3

本地版结果:

# IGBLASTN
# Query: E00509:314:HNTL2CCXY:4:1101:8024:1924 1:N:0:CTCTCTAC
# Database: ./database/human_TR_V ./database/human_TR_D ./database/human_TR_J
# Domain classification requested: imgt

# Note that your query represents the minus strand of a V gene and has been converted to the plus strand. The sequence positions refer to the converted sequence.

# V-(D)-J rearrangement summary for query sequence (Top V gene match, Top J gene match, Chain type, stop codon, V-J frame, Productive, Strand).  Multiple equivalent top matches, if present, are separated by a comma.
TRAV26-2*01     TRAJ33*01,TRAJ53*01     VA      No      Out-of-frame    No      -

# V-(D)-J junction details based on top germline gene matches (V end, V-J junction, J start).  Note that possible overlapping nucleotides at VDJ junction (i.e, nucleotides that could be assigned to either rearranging gene) are indicated
GGCAA   N/A     AGCAA

# Alignment summary between query and top germline V gene hit (from, to, length, matches, mismatches, gaps, percent identity)
FR3-IMGT        84      106     23      19      4       0       82.6
Total   N/A     N/A     23      19      4       0       82.6

# Hit table (the first field indicates the chain type of the hit)
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, gaps, q. start, q. end, s. start, s. end, evalue, bit score
# 6 hits found
V       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRAV26-2*01     82.609  23      4       0       0       84      106     168     190     0.20    25.2
V       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRBV23/OR9-2*01 80.000  20      4       0       0       128     147     35      16      5.2     20.5
V       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRBV23/OR9-2*02 80.000  20      4       0       0       128     147     35      16      5.2     20.5
J       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRAJ33*01       100.000 8       0       0       0       107     114     6       13      8.1     16.1
J       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRAJ53*01       100.000 8       0       0       0       107     114     15      22      8.1     16.1
J       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRAJ18*01       100.000 7       0       0       0       141     147     14      20      31      14.1

网页版结果:使用网页上默认的参数。

# IGBLASTN
# Query: E00509:314:HNTL2CCXY:4:1101:8024:1924 1:N:0:CTCTCTAC
# Database: IG_DB/imgt.TR.Homo_sapiens.V.f.orf.p IG_DB/imgt.TR.Homo_sapiens.D.f.orf IG_DB/imgt.TR.Homo_sapiens.J.f.orf.p
# Domain classification requested: imgt

# Note that your query represents the minus strand of a V gene and has been converted to the plus strand. The sequence positions refer to the converted sequence.

# V-(D)-J rearrangement summary for query sequence (Top V gene match, Top J gene match, Chain type, stop codon, V-J frame, Productive, Strand).  Multiple equivalent top matches, if present, are separated by a comma.
TRAV26-2*01     TRAJ33*01,TRAJ53*01     VA      No      Out-of-frame    No      -

# V-(D)-J junction details based on top germline gene matches (V end, V-J junction, J start).  Note that possible overlapping nucleotides at VDJ junction (i.e, nucleotides that could be assigned to either rearranging gene) are indicated in parentheses (i.e., (TACT)) but are not included under the V, D, or J gene itself
GGCAA   N/A     AGCAA   

# Alignment summary between query and top germline V gene hit (from, to, length, matches, mismatches, gaps, percent identity)
FR3-IMGT        84      106     23      19      4       0       82.6
Total   N/A     N/A     23      19      4       0       82.6

# Hit table (the first field indicates the chain type of the hit)
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, gaps, q. start, q. end, s. start, s. end, evalue, bit score
# 6 hits found
V       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRAV26-2*01     82.609  23      4       0       0       84      106     168     190     0.22    25.2
V       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRBV7-9*04      92.308  13      1       0       0       138     150     109     121     16      19.0
V       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRBV15*01       100.000 11      0       0       0       135     145     13      23      16      19.0
J       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRAJ33*01       100.000 8       0       0       0       107     114     6       13      9.0     16.1
J       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRAJ53*01       100.000 8       0       0       0       107     114     15      22      9.0     16.1
J       reversed|E00509:314:HNTL2CCXY:4:1101:8024:1924  TRAJ18*01       100.000 7       0       0       0       141     147     14      20      34      14.1

不一致的地方在结果倒数的第四,五行,比对结果不一致。

开始怀疑是设置evalue值的问题,于是本地版设置了evalue 20

igblastn -query one.fa -out one.out  \
  -germline_db_V ./database/human_TR_V \
  -germline_db_J ./database/human_TR_J \
  -germline_db_D ./database/human_TR_D \
  -auxiliary_data ./optional_file/human_gl.aux  \
  -organism human -domain_system imgt  \
  -ig_seqtype TCR  -min_D_match 5  \
  -evalue 20 -show_translation \
  -num_clonotype 100 \
  -D_penalty -2 \
  -J_penalty -2 \
  -min_V_length 9 \
  -min_J_length 0 \
  -num_alignments_V 5 \
  -num_alignments_D 5 \
  -num_alignments_J 5

结果一致了,

最终版参数:

igblastn -query xxx.fa \
  -out outfile.igblast.fmt7.out \
  -outfmt 7 \
  -germline_db_V ./database/human_TR_V \
  -germline_db_J ./database/human_TR_J \
  -germline_db_D ./database/human_TR_D \
  -auxiliary_data ./optional_file/human_gl.aux  \
  -organism human -domain_system imgt  \
  -ig_seqtype TCR \
  -num_threads 70 \
  -min_D_match 5 \
  -evalue 20 \
  -show_translation \
  -num_clonotype 100 \
  -D_penalty -2 \
  -J_penalty -2 \
  -min_V_length 9 \
  -min_J_length 0  \
  -num_alignments_V 3 \
  -num_alignments_D 3 \
  -num_alignments_J 3 

总结,网页版和本地版的主要不一样是evalue不一样,造成的原因
首先是参数,另一个是database导致的search space不一样.

目前igblastn分析方法:

第一步比对:

igblastn -query b0005p1_S1_L001_R1_001.fa  \
  -out b0005p1_S1_L001_R1_001.fa.fmt7.out \
  -outfmt 7 \
  -germline_db_V ./database/human_TR_V \
  -germline_db_J ./database/human_TR_J \
  -germline_db_D ./database/human_TR_D \
  -auxiliary_data ./optional_file/human_gl.aux  \
  -organism human -domain_system imgt \
  -ig_seqtype TCR -num_threads 10 \
  -min_D_match 5 \
  -evalue 20 \
  -show_translation \
  -num_clonotype 100 \
  -D_penalty -2 \
  -J_penalty -2 \
  -min_V_length 9 \
  -min_J_length 0 \
  -num_alignments_V 3 \
  -num_alignments_D 3 \
  -num_alignments_J 3

第二步,将比对结果过滤,提取VDJ信息:过滤evalue大于1e-10,
cat *_S1_L001_R1*.fmt7.out |grep -E "^V|^J|^D" |awk '$(NF-1)<1e-10' >S1.R1.igblast.fmt7.out

第三步,统计每条read上比对的VDJ个数,自己写的脚本。
python deal_igblast.py S1.R1.igblast.fmt7.out S1.R1.igblast.4to1.total.out

第四步, 根据上一步结果统计VDJ组合的结果,这一步根据需要。

python  filter_igblast_result.py S1.R1.igblast.4to1.total.out S1.R1.out.all.result.txt S1.R1.out.4gene.result.txt S1.R1.out.3gene.result.txt S1.R1.out.2gene.result.txt S1.R1.out.1gene.result.txt
可以统计出各种组合的信息:
gene_in_reads_4: 0
gene_in_reads_3: 61612
AVAJBV  AVAJBJ  AVBVBJ  AJBVBJ
61612   0       0       0
gene_in_reads_2: 282005
AVAJ    AVBV    AVBJ    AJBV    AJBJ    BVBJ
99723   26009   0       0       0       156273
gene_in_reads_1: 2234397
AV      AJ      BV      BJ
3605    150     2230039 603
total calculate reads: 2578014

如果是PE read,比对是需要分开做的,在统计的时候可以一起统计。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 219,539评论 6 508
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,594评论 3 396
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 165,871评论 0 356
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,963评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,984评论 6 393
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,763评论 1 307
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,468评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,357评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,850评论 1 317
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,002评论 3 338
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,144评论 1 351
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,823评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,483评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,026评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,150评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,415评论 3 373
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,092评论 2 355

推荐阅读更多精彩内容