Galaxy_Blast:BLAST/Diamond XML结果文件转化为tabular (txt/csv)

BLAST/Diamond XML结果文件格式:

image.png

脚本:blastxml_to_tabular.py
网址:https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/blastxml_to_tabular.py
参数说明:

====== ========= ============================================
Column NCBI name Description
------ --------- --------------------------------------------
     1 qseqid    Query Seq-id (ID of your sequence)
     2 sseqid    Subject Seq-id (ID of the database hit)
     3 pident    Percentage of identical matches
     4 length    Alignment length
     5 mismatch  Number of mismatches
     6 gapopen   Number of gap openings
     7 qstart    Start of alignment in query
     8 qend      End of alignment in query
     9 sstart    Start of alignment in subject (database hit)
    10 send      End of alignment in subject (database hit)
    11 evalue    Expectation value (E-value)
    12 bitscore  Bit score
====== ========= ============================================

The additional columns offered in the Galaxy BLAST+ wrappers are:

============================================================
Column NCBI name     Description
------ ------------- -------------------------------------------
    13 sallseqid     All subject Seq-id(s), separated by ';'
    14 score         Raw score
    15 nident        Number of identical matches
    16 positive      Number of positive-scoring matches
    17 gaps          Total number of gaps
    18 ppos          Percentage of positive-scoring matches
    19 qframe        Query frame
    20 sframe        Subject frame
    21 qseq          Aligned part of query sequence
    22 sseq          Aligned part of subject sequence
    23 qlen          Query sequence length
    24 slen          Subject sequence length
    25 salltitles    All subject titles, separated by '<>'
============================================================

默认输出12列,最多输出25列,也可自由选择,代码如下:

#输出12列
python blastxml_to_tabular.py -o output.txt -c std input.xml
#输出25列
python blastxml_to_tabular.py -o output.txt -c ext input.xml
#自定义输出
python blastxml_to_tabular.py -o output.txt -c 'qseqid,sseqid,pident' input.xml

注意:结果文件是没有表头的,即上述列名,可自行添加
输出文件改成csv的话每一行的结果会挤在一个单元格,需要csv可做个转换
可参考如下脚本:txt_to_csv.py

import csv
input_filename = input("Enter input file name: ")
output_filename = input("Enter output file name: ")
# 打开文本文件并将其转换为二维数组
with open(input_filename, 'r') as file:
    rows = [line.strip().split('\t') for line in file]

# 将列名与数据合并,构建新的二维数组
col_names = ['qseqid','sseqid','pident','length','mismatch','gapopen','qstart','qend','sstart','send','evalue','bitscore','sallseqid','score','nident','positive','gaps','ppos','qframe','sframe','qseq','sseq','qlen','slen','salltitles'] # 列名列表
data = [col_names] + rows

# 打开 CSV 文件并写入数据
with open(output_filename, 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)

直接运行即可
python txt_to_csv.py
最后:这个工具包有在线版(https://usegalaxy.org/),在左侧tools里输入xml就可以找到

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容