Python批量反向互补DNA碱基序列

输入文件： .xlsx
输出文件：.xlsx

将需要进行反向互补的序列放入Excel文件中第一列，每一行对应一条序列，反向互补转换后的序列，放入第二列，保存到新的文件。

效果如下图：

image.png

Python代码rev_comple_multi.py如下：

from openpyxl import load_workbook

import os
os.chdir(os.path.abspath(os.path.dirname(__file__)))

def reverse_complement(sequence):
    """生成DNA序列的反向互补序列"""
    complement = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
    return ''.join([complement.get(base, base) for base in sequence[::-1]])

def process_excel(input_file, output_file):
    """处理Excel文件中的DNA序列"""
    wb = load_workbook(input_file)
    ws = wb.active
    
    # 遍历第一列的所有行
    for row in range(1, ws.max_row + 1):
        # 获取第一列的序列
        original_seq = ws.cell(row=row, column=1).value
        if original_seq:
            # 生成反向互补序列并写入第二列
            rev_comp_seq = reverse_complement(str(original_seq).upper())
            ws.cell(row=row, column=2).value = rev_comp_seq
    
    wb.save(output_file)
    print(f"处理完成，结果已保存到 {output_file}")

# 使用示例
if __name__ == "__main__":
    # input_excel = "HA pool -R.xlsx"  # 输入文件名
    # output_excel = "HApool-R-revcom.xlsx"  # 输出文件名
    input_excel = "MERS pool -R.xlsx"  # 输入文件名
    output_excel = "MERSpool-R-revcom.xlsx"  # 输出文件名
    process_excel(input_excel, output_excel)

Python批量反向互补DNA碱基序列

推荐阅读更多精彩内容