<h3>Python中的re模块</h3>
Python中提供perl风格的正则表达式模式,re模块使Python拥有全部的正则表达式功能。
compile函数根据一个模式字符串和可选标志参数生成一个正则表达式对象,这个对象拥有一系列方法用于正则表达式匹配和替换。
re模块提供了与这些方法功能完全一致的函数,这些函数使用一个模式字符串作为他们的第一个参数。
本文介绍re模块的主要函数和用法。
re.py中关于一些特殊字符的定义
r"""
The special characters are:
"." Matches any character except a newline.
"." 匹配除换行符之外的任何字符
"^" Matches the start of the string.
"^" 匹配字符串的开头
"$" Matches the end of the string or just before the newline at the end of the string.
"$" 匹配字符串结尾或字符串结尾换行符之前
"*" Matches 0 or more (greedy) repetitions of the preceding RE.Greedy means that it will match as many repetitions as possible.
"*" 匹配0或者 更多(贪婪)。贪婪意味着它会尽可能的匹配更多
"+" Matches 1 or more (greedy) repetitions of the preceding RE.
"+" 匹配1或者更多。
"?" Matches 0 or 1 (greedy) of the preceding RE.
"?" 尽可能的匹配1或者0
*?,+?,?? Non-greedy versions of the previous three special characters.
{m,n} Matches from m to n repetitions of the preceding RE.
{m,n}? Non-greedy version of the above.
"\\" Either escapes special characters or signals a special sequence.
[] Indicates a set of characters.
A "^" as the first character indicates a complementing set.
"|" A|B, creates an RE that will match either A or B.
(...) Matches the RE inside the parentheses.
The contents can be retrieved or matched later in the string.
(?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).
(?:...) Non-grouping version of regular parentheses.
(?P<name>...) The substring matched by the group is accessible by name.
(?P=name) Matches the text matched earlier by the group named name.
(?#...) A comment; ignored.
(?=...) Matches if ... matches next, but doesn't consume the string.
(?!...) Matches if ... doesn't match next.
(?<=...) Matches if preceded by ... (must be fixed length).
(?<!...) Matches if not preceded by ... (must be fixed length).
(?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
the (optional) no pattern otherwise.""
r"""The special sequences consist of "\\" and a character from the list below. If the ordinary character is not on the list, then the resulting RE will match the second character.
\number Matches the contents of the group of the same number.
\A Matches only at the start of the string.
\Z Matches only at the end of the string.
\b Matches the empty string, but only at the start or end of a word.
\B Matches the empty string, but not at the start or end of a word.
\d Matches any decimal digit; equivalent to the set [0-9] in bytes patterns or string patterns with the ASCII flag. In string patterns without the ASCII flag, it will match the whole range of Unicode digits.
\d 匹配任何十进制整数;相当于字节模式的集合[0-9]或者有ASCII标志的字符串模式。在没有ASCII标志的字符串模式,它将匹配所有的Unicode整数
\D Matches any non-digit character; equivalent to [^\d].
\D 匹配任何非数字字符;相当于[^\d]
\s Matches any whitespace character; equivalent to [ \t\n\r\f\v] in bytes patterns or string patterns with the ASCII flag.In string patterns without the ASCII flag, it will match the whole range of Unicode whitespace characters.
\s 匹配任何空格字符;相当于具有ASCII标志的字节模式或字符串模式中的[\ t \ n \ r \ f \ v]。在没有ASCII标志的字符串模式中,它将匹配Unicode空格字符的整个范围。
\S Matches any non-whitespace character; equivalent to [^\s].
\S 匹配任何非空字符;相当于[^\s]。
\w Matches any alphanumeric character; equivalent to [a-zA-Z0-9_] in bytes patterns or string patterns with the ASCII flag. In string patterns without the ASCII flag, it will match the range of Unicode alphanumeric characters (letters plus digits plus underscore). With LOCALE, it will match the set [0-9_] plus characters defined as letters for the current locale.
\w 配任何字母数字字符;相当于[a-zA-Z0-9_]字节模式或带ASCII标志的字符串模式。在没有ASCII标志的字符串模式中,它将匹配Unicode字母数字字符(字母加数字和下划线)的范围。使用LOCALE,它将匹配定义为当前语言环境的字母的集[0-9_]加字符。
\W Matches the complement of \w.
\W 匹配\w的补码
\\ Matches a literal backslash.
\\ 匹配反斜杠
python的re模块包含以下函数:
r"""
match Match a regular expression pattern to the beginning of a string.
match 从字符串开头匹配正则表达式。
fullmatch Match a regular expression pattern to all of a string.
fullmatch 从整个字符串匹配正则表达式
search Search a string for the presence of a pattern.
search
sub Substitute occurrences of a pattern found in a string.
sub 替换在字符串
subn Same as sub, but also return the number of substitutions made.
subn 和sub一样,但是返回所做替换的次数
split Split a string by the occurrences of a pattern.
findall Find all occurrences of a pattern in a string.
finditer Return an iterator yielding a match object for each match.
compile Compile a pattern into a RegexObject.
purge Clear the regular expression cache.
escape Backslash all non-alphanumerics in a string."""
该模块中的一些功能将标志作为可选参数:
r"""
A ASCII For string patterns, make \w, \W, \b, \B, \d, \D match the corresponding ASCII character categories (rather than the whole Unicode categories, which is the default). For bytes patterns, this flag is the only available behaviour and needn't be specified.
I IGNORECASE Perform case-insensitive matching.
L LOCALE Make \w, \W, \b, \B, dependent on the current locale.
M MULTILINE "^" matches the beginning of lines (after a newline) as well as the string.
"$" matches the end of lines (before a newline) as well as the end of the string.
S DOTALL "." matches any character at all, including the newline.
X VERBOSE Ignore whitespace and comments for nicer looking RE's.
U UNICODE For compatibility only. Ignored for string patterns (it is the default), and forbidden for bytes patterns."""
re模块中同样定义了一个异常"错误".。