Python 提取邮件头基本信息

1 邮件内容

假设目前邮件名叫“1.txt”,邮件内容为:

From:   Justin-Bieber@entertain.org on behalf of Bieber
Leader [leader@hello.org]
Sent:   2017-07-01 12:48
To: 'staff@hello.org'; custom@hello.org;
Willim Johnson; John Snow
Subject:    The battlefield in Winterfell


I have just met then. More details as soon as possible. So far, so good.

Sent via iPhone 7 plus

2 提取思路

  • 要求把邮件头部信息提取出来,需要提取信息:
    • 发件人(From:)、发件时间(Sent)、收件人(To)、主题(Subject)
  • 初步提取信息所在行的内容即可。
  • 使用一个提取函数,把四个关键词放入数组中,用正则提取。
  • 四个信息都做了全局函数,如果曾经匹配过,则全局函数 + 1,以做标识。
  • 如果一个信息已经匹配过,而下一个信息还没匹配到,这一行的内容也需要读取出来。
  • 提取函数的返回值,如果是 None 则不做处理。
# coding: utf-8
import re

from_count = 0
sent_count = 0
to_count = 0
subject_count = 0


def inspect_string(string):
    global from_count
    global sent_count
    global to_count
    global subject_count

    keyword_list = ['From:', 'Sent:', 'To:', 'Subject:']
    for keyword in keyword_list:
        regex_str = ".*({0}.*)".format(keyword)
        match_obj = re.match(regex_str, string)

        if re.match(".*(From:.*)", string):
            from_count += 1

        if re.match(".*(Sent:.*)", string):
            sent_count += 1

        if re.match(".*(To:.*)", string):
            to_count += 1

        if re.match(".*(Subject:.*)", string):
            subject_count += 1

        if match_obj:
            return match_obj.group(1)

        if from_count > 0 and sent_count < 1:
            return string

        if sent_count > 0 and to_count < 1:
            return string

        if to_count > 0 and subject_count < 1:
            return string


with open('1.txt', 'rb') as f:
    for line in f:
        result = inspect_string(str(line))
        if result is None:
            continue
        print(result)

3 运行结果

From:   Justin-Bieber@entertain.org on behalf of Bieber
Leader [leader@hello.org]

Sent:   2017-07-01 12:48

To: 'staff@hello.org'; custom@hello.org;

Willim Johnson; John Snow

Subject:    The battlefield in Winterfell
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容