2023-07-20 三代数据的比对 Blasr(需补充)

该软件来自picbio官网
准备文件

mkdir -p /home/train/06.reads_aligment/blasr
cd /home/train/06.reads_aligment/blasr
ln -s ~/00.incipient_data/data_for_genome_assembling/assemblies_of_Malassezia_sympodialis/Malassezia_sympodialis.genome_V01.fasta genome.fasta
ln -s ~/04.genome_assembling/Canu/Malassezia_sympodialis/subreads.fasta ./

PATH=/opt/biosoft/miniconda3_for_pbbioconda/bin/:$PATH

开始运行

blasr subreads.fasta genome.fasta --header -m 5 --out blasr.out5 --minPctAccuracy 70 --nproc 8 --stride 10
[train@MiWiFi-R3P-srv blasr]$ head -n 2 blasr.out5 
qName qLength qStart qEnd qStrand tName tLength tStart tEnd tStrand score numMatch numMismatch numIns numDel mapQV qAlignedSeq matchPattern tAlignedSeq
m54086_170204_081430/5112079/3551_4400/0_849 849 321 623 +  MS01Contig08 440595 301851 302159 + -878 252 37 13 19 254 TACCCGTTCGAGAA-GG-CCGCTGAGCCCTCGCTTCCGTGGGGAGCAATGCGCTGGCGCCGGTACCC-ATCCGG-GAGGAGCGTTGCATTGCCTGCAAGCT-CGCG-GGCCATCTGCCC--CGCCCAGGCCATCACCATCGAG-GCTGAG-CCCAAAGAGCTGATGGCAGC-CGCCGAACCACCCGCTAATGACATCGACATGAGACAAGTGCATCTACTGCGGCTTCTTCCA-G-GTC-TGTCCCGTGGATGCCATCGTCAG-GCCCCA-ACG-TTGAGTTCTCCACGGAGACCCATGA-GAGC-GCGGTACAACAAGGA ||||||||||||||*||*|||||||||||*||||||||*||*||||*|||||||*||||||*|||||*|*|*||*||||||||*|||||*||*||||||||**|||*||||||||||||**|||*|**||||||||*||||||*||*|||*|||***|*||*||*|||*||*||*||*||*|||||||*|*||*||||||||||**|||||||||||||||*|||*|*|*|||*|*|*|*||*|||||||||||*||||||*|*|*|*||*|||*|*||||||*|*||*|||||||**||*||||**|*|||||||||||| TACCCGTTCGAGAAGGGCCCGCTGAGCCCCCGCTTCCGCGGCGAGC-ATGCGCT-GCGCCGCTACCCGA-CGGGCGAGGAGCGCTGCATCGCGTGCAAGCTGTGCGAGGCCATCTGCCCGGCGCTC--GCCATCACGATCGAGAGC-GAGACCC---GCGCGGACGGC-GCGCGGCGCACGACCCGCT-ACGATATCGACATGA-CCAAGTGCATCTACTGTGGCATGTGCCAGGAGGCGTGCCCCGTGGATGCGATCGTC-GAGACGCAGACGCTCGAGTTCGCTACCGAGACCCGCGAGGAGCTTCTGTACAACAAGGA
[train@MiWiFi-R3P-srv blasr]$ 

统计有多少条序列能比对上去

[train@MiWiFi-R3P-srv blasr]$ cut -d " " -f 1 blasr.out5 | uniq | wc -l
5629
mapping rate = 5629/6000

统计每行的错误率

需补充

统计整体的错误率

[train@MiWiFi-R3P-srv blasr]$ ls
blasr.out5  genome.fasta  subreads.fasta
[train@MiWiFi-R3P-srv blasr]$ perl -e '<>; while (<>) { @_ = split /\s+/; $length = $_[3] - $_[2]; $error_ratio = 1 - $_[11] / $length; push @er,$error_ratio if $length >= 1000; } @er = sort {$a <=> $b} @er; print "$er[@er/2]\n";' blasr.out5
0.0902698593690612

小技巧:截取结果文件的碱基序列

需补充

小技巧:截取subreads的1000条序列

[train@MiWiFi-R3P-srv blasr]$ perl -e 'while (<>) { if (m/^>/) { $num ++; last if $num > 1000; } print }' subreads.fasta | grep ">" | wc -l
1000

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容