Python中的re模块

<h3>Python中的re模块</h3>

Python中提供perl风格的正则表达式模式,re模块使Python拥有全部的正则表达式功能。

compile函数根据一个模式字符串和可选标志参数生成一个正则表达式对象,这个对象拥有一系列方法用于正则表达式匹配和替换。

re模块提供了与这些方法功能完全一致的函数,这些函数使用一个模式字符串作为他们的第一个参数。

本文介绍re模块的主要函数和用法。


re.py中关于一些特殊字符的定义

r"""
The special characters are:
    "."      Matches any character except a newline.
    "."      匹配除换行符之外的任何字符
    "^"      Matches the start of the string.
    "^"      匹配字符串的开头
    "$"      Matches the end of the string or just before the newline at the end of the string.
    "$"      匹配字符串结尾或字符串结尾换行符之前
    "*"      Matches 0 or more (greedy) repetitions of the preceding RE.Greedy means that it will match as many repetitions as possible.
    "*"      匹配0或者 更多(贪婪)。贪婪意味着它会尽可能的匹配更多
             
    "+"      Matches 1 or more (greedy) repetitions of the preceding RE.
    "+"      匹配1或者更多。
    "?"      Matches 0 or 1 (greedy) of the preceding RE.
    "?"      尽可能的匹配1或者0
    *?,+?,?? Non-greedy versions of the previous three special characters.
    
    {m,n}    Matches from m to n repetitions of the preceding RE.
    {m,n}?   Non-greedy version of the above.
    "\\"     Either escapes special characters or signals a special sequence.
    []       Indicates a set of characters.
             A "^" as the first character indicates a complementing set.
    "|"      A|B, creates an RE that will match either A or B.
    (...)    Matches the RE inside the parentheses.
             The contents can be retrieved or matched later in the string.
    (?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).
    (?:...)  Non-grouping version of regular parentheses.
    (?P<name>...) The substring matched by the group is accessible by name.
    (?P=name)     Matches the text matched earlier by the group named name.
    (?#...)  A comment; ignored.
    (?=...)  Matches if ... matches next, but doesn't consume the string.
    (?!...)  Matches if ... doesn't match next.
    (?<=...) Matches if preceded by ... (must be fixed length).
    (?<!...) Matches if not preceded by ... (must be fixed length).
    (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
                       the (optional) no pattern otherwise.""

r"""The special sequences consist of "\\" and a character from the list below.  If the ordinary character is not on the list, then the resulting RE will match the second character.
    \number  Matches the contents of the group of the same number.
    \A       Matches only at the start of the string.
    \Z       Matches only at the end of the string.
    \b       Matches the empty string, but only at the start or end of a word.
    \B       Matches the empty string, but not at the start or end of a word.
    \d       Matches any decimal digit; equivalent to the set [0-9] in bytes patterns or string patterns with the ASCII flag. In string patterns without the ASCII flag, it will match the whole range of Unicode digits.
    \d      匹配任何十进制整数;相当于字节模式的集合[0-9]或者有ASCII标志的字符串模式。在没有ASCII标志的字符串模式,它将匹配所有的Unicode整数
    \D       Matches any non-digit character; equivalent to [^\d].
    \D       匹配任何非数字字符;相当于[^\d]
    \s       Matches any whitespace character; equivalent to [ \t\n\r\f\v] in bytes patterns or string patterns with the ASCII flag.In string patterns without the ASCII flag, it will match the whole range of Unicode whitespace characters.
    \s       匹配任何空格字符;相当于具有ASCII标志的字节模式或字符串模式中的[\ t \ n \ r \ f \ v]。在没有ASCII标志的字符串模式中,它将匹配Unicode空格字符的整个范围。
    \S       Matches any non-whitespace character; equivalent to [^\s].
    \S       匹配任何非空字符;相当于[^\s]。
    \w       Matches any alphanumeric character; equivalent to [a-zA-Z0-9_] in bytes patterns or string patterns with the ASCII flag.  In string patterns without the ASCII flag, it will match the range of Unicode alphanumeric characters (letters plus digits plus underscore). With LOCALE, it will match the set [0-9_] plus characters defined as letters for the current locale.
    \w       配任何字母数字字符;相当于[a-zA-Z0-9_]字节模式或带ASCII标志的字符串模式。在没有ASCII标志的字符串模式中,它将匹配Unicode字母数字字符(字母加数字和下划线)的范围。使用LOCALE,它将匹配定义为当前语言环境的字母的集[0-9_]加字符。
    \W       Matches the complement of \w.
    \W       匹配\w的补码 
    \\       Matches a literal backslash.
    \\       匹配反斜杠

python的re模块包含以下函数:

r"""
    match     Match a regular expression pattern to the beginning of a string.
    match     从字符串开头匹配正则表达式。
    fullmatch Match a regular expression pattern to all of a string.
    fullmatch  从整个字符串匹配正则表达式
    search    Search a string for the presence of a pattern.
    search    
    sub       Substitute occurrences of a pattern found in a string.
    sub       替换在字符串
    subn      Same as sub, but also return the number of substitutions made.
    subn      和sub一样,但是返回所做替换的次数
    split     Split a string by the occurrences of a pattern.
    findall   Find all occurrences of a pattern in a string.
    finditer  Return an iterator yielding a match object for each match.
    compile   Compile a pattern into a RegexObject.
    purge     Clear the regular expression cache.
    escape    Backslash all non-alphanumerics in a string."""

该模块中的一些功能将标志作为可选参数:

r"""
    A  ASCII       For string patterns, make \w, \W, \b, \B, \d, \D match the corresponding ASCII character categories  (rather than the whole Unicode categories, which is the default). For bytes patterns, this flag is the only available behaviour and needn't be specified.
    I  IGNORECASE  Perform case-insensitive matching.
    L  LOCALE      Make \w, \W, \b, \B, dependent on the current locale.
    M  MULTILINE   "^" matches the beginning of lines (after a newline) as well as the string.
                   "$" matches the end of lines (before a newline) as well as the end of the string.
    S  DOTALL      "." matches any character at all, including the newline.
    X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.
    U  UNICODE     For compatibility only. Ignored for string patterns (it is the default), and forbidden for bytes patterns."""

re模块中同样定义了一个异常"错误".。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,372评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,368评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,415评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,157评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,171评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,125评论 1 297
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,028评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,887评论 0 274
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,310评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,533评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,690评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,411评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,004评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,659评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,812评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,693评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,577评论 2 353

推荐阅读更多精彩内容