当前版本仅支持常用mips汇编指令(在变量r_dict
,i_dict
和j_dict
中列出),不支持立即数的非十进制输入和非编号型寄存器(即支持用$1, $2
等命名的寄存器,但不支持用$v0, $t1
等命名的寄存器,如有感兴趣的道友欢迎在reg_dict
变量下补全寄存器命名表)。注意,生成的机器指令格式是Quartus软件支持的mif文件格式。
开发环境:python2.7
由于其中部分print
语句不符合python3的语法,因此不能在python3下执行。(若注释掉所有的print语句,则应该可以在python3下运行,这个工作量其实并不大)
待修正bug:
- 遇到空行时会发生错误【在mips指令文件中不要出现空行即可避免出错】
如果在使用中遇到其他bug欢迎在评论区中指出
为了方便读代码的人理解,我简单解释一下代码的思路:
这个程序是我大二写的,思路比较简单。首先我关注最普通的情况,就是处理一个顺序执行的mips指令的序列,也就是说暂时我不需要管brach指令的tags什么的。在这个假设下,我们遵循一个简单的规则:一条一条地处理mips指令。
现在我们的问题转变到了翻译单条mips指令上。根据我们之前的假设,只要我们能翻译一条指令了,整个指令序列就可以被逐条翻译。对于一条指令,根据教材上的知识我们知道,分为r型,i型和j型,每一种指令型都有自己固定的二进制格式。比如说r型指令翻译成二进制后,一定是000000 $rs $rt $rd $shamt $funct
的格式。有了这样的认识,我们就可以分成下面几步翻译单条mips指令:
- [144行-155行] 把指令按空格拆开成一条一条字符串,比如说
add $3 $1 $2
拆成['add', '$3', '$1', '$2']
- [158, 196, 233行] 判断指令的类型,比如我们可以通过'add'指令判断出它属于'r'型
- [157行-243行] 根据不同的指令类型,把我们第一步拆开得到的字符串序列,按照对应类型的格式拼装成完整的machine code
这样我们就实现了简单的翻译。接下来只需要考虑有tags的情况,比如说有一行我们写的是beq $1, LOOP
,下面某一行就是”LOOP“对应的LOOP: add $1 $2 $3
。处理这样的情况就要求我们在翻译最开始知道有哪些tags,这个就是程序132-144行做的事情。实现到了这一步,就已经完成了我的程序的功能了。
然后是两个函数,reg2num
和complete
。
-
reg2num
函数的注释写了:convert register name to register number,将寄存器的字符串名称转换为machine code里的数字 -
complete
函数的注释里写了:fill a string of length var<width> with binary representation of var<num>,将一个数扩充成指定长度的字符串
这两个函数的实现都非常简单,在此也就不赘述了。
code 如下(修改INFILE
变量以读取你自己的mips指令文件,修改out_file.write
语句以生成你想要的输出格式)
# file: mif.py
# 2018/06/01 by shawn233
# encoding: utf-8
import copy
INFILE = "mips.txt"
OUTFILE = INFILE.split('.')[0] + ".mif"
tag_dict = {}
r_dict = {'add', 'sub', 'and', 'or', 'xor', 'sll', 'srl', 'sra', 'jr'}
i_dict = {'lw', 'sw', 'beq', 'bne' , 'addi', 'andi', 'ori', 'xori', 'lui'}
j_dict = {'j', 'jal'}
special_r_dict = {'sll', 'srl', 'sra', 'jr'}
shift_r_dict = {'beq', 'bne', 'sll', 'srl', 'sra'}
special_i_dict = {'beq', 'bne', 'lw', 'sw', 'lui'}
branch_i_dict = {'beq', 'bne'}
op_code = { 'addi':'001000',
'andi':'001100',
'ori' :'001101',
'xori':'001110',
'lw' :'100011',
'sw' :'101011',
'beq' :'000100',
'bne' :'000101',
'lui' :'001111',
'j' :'000010',
'jal' :'000011' }
funct = {'add':'100000',
'sub':'100010',
'and':'100100',
'or' :'100101',
'xor':'100110',
'sll':'000000',
'srl':'000010',
'sra':'000011',
'jr' :'001000'}
reg_dict = {
'$0':0,
'$1':1,
'$2':2,
'$3':3,
'$4':4,
'$5':5,
'$6':6,
'$7':7,
'$8':8,
'$9':9,
'$10':10,
'$11':11,
'$12':12,
'$13':13,
'$14':14,
'$15':15,
'$16':16,
'$17':17,
'$18':18,
'$19':19,
'$20':20,
'$21':21,
'$22':22,
'$23':23,
'$24':24,
'$25':25,
'$26':26,
'$27':27,
'$28':28,
'$29':29,
'$30':30,
'$31':31,
'$ra':8
}
def complete (num, width):
'''
fill a string of length var<width> with binary representation of var<num>
'''
format_ = '{0:0>'+str(width)+'b}'
if (num >= 0):
return format_.format(num)
else:
return bin(num & int('1' * width, 2))
def bin2hex (bin_form):
'''
convert binary string to corresponding hexadecimal string
'''
format_ = '{0:1x}'
hex_form = ''
for i in xrange(0, 32, 4):
hex_form += format_.format(int(bin_form[i: i+4], base=2))
return hex_form
def reg2num(reg):
'''
convert register name to register number
'''
return reg_dict[reg.strip()]
def main():
print OUTFILE
in_file = open(INFILE, 'r')
mips_ = in_file.readlines()
in_file.close()
out_file = open(OUTFILE, 'w')
code = None
machine_code = None
op = None
regs = None
rs = None
rt = None
rd = None
sa = None
imm = None
line_num_format = '{0:x}'
mips = copy.deepcopy(mips_)
# clear comments
for line in xrange(len(mips)):
code = mips[line].strip().split('#')
mips[line] = code[0].strip()
# save tag
for line in xrange(len(mips)):
code = mips[line].strip().split(":")
if (len(code) == 2):
tag_dict[code[0].strip()] = line
mips[line] = code[1].strip()
elif (len(code) > 2):
print "non-standard mips instruction encountered in line", line, "code:", mips[line]
exit()
# conversion
for line in xrange(len(mips)):
code = mips[line].strip()
whitespace_ind = code.find(' ')
op = code[:whitespace_ind].strip()
regs = code[whitespace_ind:].strip().split(',')
#raw_input()
print
print "code", code
print "whitepace_ind", whitespace_ind
print "op", op
print "regs", regs
#raw_input()
machine_code = ''
if op in r_dict:
# 000000 $rs $rt $rd $shamt $funct
machine_code += ('0' * 6)
if op in special_r_dict:
if op in shift_r_dict:
try:
rd = reg2num(regs[0])
rs = 0
rt = reg2num(regs[1])
sa = int (regs[2].strip())
except Exception, e:
print "Exception encountered in r-type conversion |", e
#exit()
else: # jr
try:
rd = 0
rs = reg2num(regs[0])
rt = 0
sa = 0
except Exception, e:
print "Exception encountered in r-type conversion |", e
#exit()
else:
try:
rd = reg2num(regs[0])
rs = reg2num(regs[1])
rt = reg2num(regs[2])
sa = 0
except Exception, e:
print "Exception encountered in r-type conversion |", e
#exit()
print "rs:", rs, "type:", type(rs)
print "rt:", rt, "type:", type(rt)
print "rd:", rd, "type:", type(rd)
print "sa:", sa, "type:", type(sa)
machine_code += (complete(rs, 5)+ complete(rt, 5)+ complete(rd, 5) + complete(sa, 5))
machine_code += funct[op]
elif op in i_dict:
machine_code += op_code[op]
if op in special_i_dict:
if op in branch_i_dict:
try:
rs = reg2num(regs[0])
rt = reg2num(regs[1])
imm = tag_dict[regs[2].strip()] - line - 1
except Exception, e:
print "Exception encountered in i-type conversion |", e
elif op == 'lui':
try:
rs = 0
rt = reg2num(regs[0])
imm = int(regs[1].strip())
except Exception, e:
print "Exception encountered in i-type conversion |", e
else: # sw lw
try:
rt = reg2num(regs[0])
regs[1] = regs[1].strip()
pos1 = regs[1].find('(')
pos2 = regs[1].find(')')
imm = int(regs[1][:pos1].strip())
rs = reg2num(regs[1][pos1+1:pos2])
except Exception, e:
print "Exception encountered in i-type conversion |", e
else:
try:
rt = reg2num(regs[0])
rs = reg2num(regs[1])
imm = int(regs[2].strip())
except Exception, e:
print "Exception encountered in i-type conversion |", e
machine_code += (complete(rs, 5)+complete(rt, 5)+complete(imm, 16))
elif op in j_dict:
machine_code += op_code[op]
try:
imm = int(tag_dict[regs[0].strip()])
except Exception, e:
print "Exception encountered in j-type conversion |", e
machine_code += complete(imm, 26)
else:
print "unknown instruction encountered in line", line, "code:", mips[line]
exit()
out_file.write(line_num_format.format(line) + ' : ' + bin2hex(machine_code) + '; % ' + mips_[line].strip('\n') + ' | ' + machine_code + ' %' + '\n')
out_file.close()
if __name__ == "__main__":
main()
#print complete(-1, 5)
Appendix
附上我的mips和翻译之后的mif文件
mips:
main: addi $4, $0, 1
addi $5, $0, 2
addi $6, $0, 3
loop: lw $1, 0($0)
lw $2, 4($0)
lw $3, 8($0)
beq $3, $0, add
beq $3, $4, sub
beq $3, $5, and
beq $3, $6, or
j loop
add: add $7, $1, $2
j show
sub: sub $7, $2, $1
j show
and: and $7, $1, $2
j show
or: or $7, $1, $2
show: sw $1, 0($0)
sw $2, 4($0)
sw $3, 8($0)
j loop
mif:
0 : 20040001; % main: addi $4, $0, 1 | 00100000000001000000000000000001 %
1 : 20050002; % addi $5, $0, 2 | 00100000000001010000000000000010 %
2 : 20060003; % addi $6, $0, 3 | 00100000000001100000000000000011 %
3 : 8c010000; % loop: lw $1, 0($0) | 10001100000000010000000000000000 %
4 : 8c020004; % lw $2, 4($0) | 10001100000000100000000000000100 %
5 : 8c030008; % lw $3, 8($0) | 10001100000000110000000000001000 %
6 : 10600004; % beq $3, $0, add | 00010000011000000000000000000100 %
7 : 10640005; % beq $3, $4, sub | 00010000011001000000000000000101 %
8 : 10650006; % beq $3, $5, and | 00010000011001010000000000000110 %
9 : 10660007; % beq $3, $6, or | 00010000011001100000000000000111 %
a : 08000003; % j loop | 00001000000000000000000000000011 %
b : 00223820; % add: add $7, $1, $2 | 00000000001000100011100000100000 %
c : 08000012; % j show | 00001000000000000000000000010010 %
d : 00413822; % sub: sub $7, $2, $1 | 00000000010000010011100000100010 %
e : 08000012; % j show | 00001000000000000000000000010010 %
f : 00223824; % and: and $7, $1, $2 | 00000000001000100011100000100100 %
10 : 08000012; % j show | 00001000000000000000000000010010 %
11 : 00223825; % or: or $7, $1, $2 | 00000000001000100011100000100101 %
12 : ac010000; % show: sw $1, 0($0) | 10101100000000010000000000000000 %
13 : ac020004; % sw $2, 4($0) | 10101100000000100000000000000100 %
14 : ac030008; % sw $3, 8($0) | 10101100000000110000000000001000 %
15 : 08000003; % j loop | 00001000000000000000000000000011 %
Q&A
有同学问到special/shift_r_dict
和special/branch_i_dict
这四个分类的依据。其实这个问题问得稍有不妥,因为special_r_dict
和shift_r_dict
并不是并列的分类,实际上是不同层级的一个分类;special_i_dict
和brach_i_dict
同理。
special_r_dict
区别于r_dict
- special_r_dict
中的指令(减号表示差集)。普通的r型指令,如add $1, $2, $3
;特殊(special)的r型指令,如sll $1, $2, 3
。对比之下不难发现,对于r型指令的machine code格式
# 000000 $rs $rt $rd $shamt $funct
普通的r型指令填充的是$rs
, $rt
和$rd
三个位置,而特殊的r型指令填充的则不是。因此在实现时,这两种类型的指令需要分开来写。
i_dict
和special_i_dict
同理,用于区分普通的i型指令和特殊的i型指令。
special_r_dict
和shift_r_dict
同理,用于区分包含shift
和不包含shift
(可以看代码,不包含的是jr
指令)的指令。special_i_dict
和branch_i_dict
同理。