哈夫曼(Huffman)编码python代码实现

首先看定义

哈夫曼编码(Huffman Coding)，又称霍夫曼编码，是一种编码方式，哈夫曼编码是可变字长编码(VLC)的一种。Huffman于1952年提出一种编码方法，该方法完全依据字符出现概率来构造异字头的平均长度最短的码字，有时称之为最佳编码，一般就叫做Huffman编码（有时也称为霍夫曼编码）。

我们来看具体步骤

1. 制备每个字符的概率表

输入是存放字符的txt文本

输出以python字典的形式给出每个字符的概率，就是出现的次数：

代码如下

def findTheFrequency(text):
    result=dict()
    with open(text,'r') as f:
        for line in f.readlines():
            line = line.lower()
            for i in line:
                if i.isalpha():
                    if i in result:
                        result[i]+=1
                    else:
                        result.update({i:1})
    return result
text="GreA3_Huffman_origin.txt"
result=findTheFrequency(text)

2. 创建Huffman数

首先定义一个节点的类，包含名称，概率，左孩子和右孩子
输入的是上一步输出的概率表
输出的是Huffman树的根节点，因为只要知道根节点，其实整棵树的信息就都知道了
代码如下：

class Node:
    def __init__(self):
        self.frequency=0
        self.name=None
        self.lchild=None
        self.rchild=None
        self.code=None
    def __lt__(self,other):
        return self.frequency<other.frequency

# establish the Huffman Tree
def estblishHuffmanTree(info_dict):
    #output: the base node
    node_list=[]
    for i in info_dict:
        a = Node()
        a.frequency=info_dict[I]
        a.name=I
        node_list.append(a)
    while len(node_list)>1:
        node_list.sort(reverse=True)
        node_1 = node_list.pop()
        node_2 = node_list.pop()
        new_node = Node()
        new_node.frequency=node_1.frequency+node_2.frequency
        new_node.lchild=node_1
        new_node.rchild=node_2
        node_list.append(new_node)
    return new_node
base_node = estblishHuffmanTree(result)

3. 根据Huffman树进行编码

输入的是上一步输出的根节点以及原始文档
输出的是编码后的字典和结束后的文档

注意编码的过程中采用了回溯法的思想
代码如下：

def encode(node,rst_dict,code):
    if node.name:
        rst_dict.update({node.name:code})
        return
    code+='0'
    encode(node.lchild,rst_dict,code)
    code = code[:-1]
    code+='1'
    encode(node.rchild,rst_dict,code)
    return rst_dict

code_dict=encode(base_node,{},'')
code_text="GreA3_Huffman_code.txt"

def encode_text(code_dict,text,code_text):
    string=''
    with open(text,'r') as f:
        for line in f.readlines():
            line=line.lower()
            for i in line:
                if i.isalpha():
                    string+=code_dict[I]
                else:
                    string+='\n'
    with open(code_text,'w') as f:
        f.write(string)
            
encode_text(code_dict,text,code_text)

4. 解码

就是根据编号的码返回文本
代码如下：

def decode(text_addtedd,result_address,base_node):
    text_string=''
    a=base_node
    with open(text_addtedd,'r') as f:
        for line in f.readlines():
            for i in line:
                if i=='0':
                    b=a.lchild
                    if b.name:
                        text_string+=b.name
                        a=base_node
                    else:
                        a = b
                elif i=='1':
                    b=a.rchild
                    if b.name:
                        text_string+=b.name
                        a=base_node
                    else:
                        a = b
                else:
                    text_string+='\n'
    with open(result_address,'w') as f:
        f.write(text_string)
result_address="GreA3_Huffman_result.txt"
decode(code_text,result_address,base_node)

哈夫曼(Huffman)编码python代码实现

1. 制备每个字符的概率表

2. 创建Huffman数

3. 根据Huffman树进行编码

4. 解码

推荐阅读更多精彩内容