针对database中的fasta,报错时候重新bwa index -a is HLA-I_II_CDS.fasta
1. Read Alignment (HPRA, faster but less accurate)
查看HPRArnaseq_classI-II.sh,bwa方法比较繁琐:
rd1=/liu/sample_rna_suc/clean_by_trimmomatic/SRR7094744_1_paired.fa
rd2=/liu/sample_rna_suc/clean_by_trimmomatic/SRR7094744_2_paired.fa
hla=/root/biosoft/HLAminer_v1.4/database/HLA-I_II_CDS.fasta
hlanp=/root/biosoft/HLAminer_v1.4/database/hla_nom_p.txt
bwa aln -e 0 -o 0 $hla $rd1 > aln_test.1.sai
bwa aln -e 0 -o 0 $hla $rd2 > aln_test.2.sai
bwa sampe -o 1000 $hla aln_test.1.sai aln_test.2.sai $rd1 $rd2 > aln.sam
HLAminer.pl -a aln.sam -h $hla -s 500 -p $hlanp
标准答案:
只答对了HLA-C,如果使用OptiType速度还快,还能全部正确,要不是为了拿MHC-II结果,忍了!
如果直接使用,如下命令
bwa mem -t 5 -a $hla $rd1 $rd2 > aln.sam方法,HLA-C也沦陷了,哈哈。
2. Targeted Assembly of Shotgun Reads (HPTASR)
cpan Bio::SearchIO
运行路径下准备sample.fof文件place the fullpath location of your short read fastq or fasta files in the "sample.fof" file;
parseXMLblast.pl 报错缺Bio::SearchIO::blastxml
cpan YAML
cpan Algorithm::Diff
然后安装Bio::SearchIO::blastxml test一直过不了,直接cpan 进入后 infoce install Bio::SearchIO::blastxml
运行parseXMLblast.pl,报错expat.h:没有那个文件或目录 yum install expat-devel
然后cpan XML::Parser
查看HPTASRrnaseq-classI-II.sh,TASR在bin中已经安装
change shebang line in all PERL (.pl) scripts (似乎没改也能运行HLAminer.pl)
搭配test-demo中配置文件ncbiBlastConfig2-2-29.txt,重新安装了2.2.29的blast+
修改bin下的ncbiBlastConfig2-2-29.txt中2.2.29 blastn的路径, alignments:-max_target_seqs 5需要删除。
hla=/root/biosoft/HLAminer_v1.4/database/HLA-I_II_CDS.fasta
hlanp=/root/biosoft/HLAminer_v1.4/database/hla_nom_p.txt
nbc=/root/biosoft/HLAminer_v1.4/bin/ncbiBlastConfig2-2-29.txt
TASR -f sample.fof -m 20 -k 20 -s $hla -i 1 -b TASRhla -w 1
cat TASRhla.contigs |perl -ne 'if(/size(\d+)/){if($1>=200){$flag=1;print;}else{$flag=0;}}else{print if($flag);}' > TASRhla200.contigs
formatdb -p F -i TASRhla200.contigs
parseXMLblast.pl -c $nbc -d $hla -i TASRhla200.contigs -o 0 -a 1 > tig_vs_hla-ncbi.coord
parseXMLblast.pl -c $nbc -i $hla -d TASRhla200.contigs -o 0 > hla_vs_tig-ncbi.coord
HLAminer.pl -b tig_vs_hla-ncbi.coord -r hla_vs_tig-ncbi.coord -c TASRhla200.contigs -h $hla -p $hlanp
结果如下:
基本首选项全对!!
以上为RNA-seq分析,如果是WES分析,可以使用HLA-I_II_GEN.fasta替换HLA-I_II_CDS.fasta,但是实际操作过程中发现,速度慢到崩溃啊,哈哈哈,24小时,TASR才能完成5%左右,获得约10M的TASRhla.contigs数据,等不急了,直接拿来做后续分析,幸运的是也能出结果,结果还算正确,谢天谢天地!
不过WES也可以用HLA-I_II_CDS.fasta,但是正确性不高,不推荐操作!