常用方法

findall：

匹配所有符合规律的内容，并且返回的是一个列表

search：

匹配第一个符合规律的内容，返回一个正则对象

Sub：

替换符合规律的内容，并返回

用法解析

"."的用法：

import re
#from re import findsll#do not use this in a large project
s = 'hhhhxyxy'
a = re.findall('x.',s)
b = re.findall('x...',s)
print(a)
#['xy','xy']返回的列表中存储的是所有符合规律的
print(b)
#['xyxy']

可以得到dnot是一个占位的符号

"*"的用法：

import re
#from re import findsll
s = 'hhhhxyxy'
a = re.findall('x*',s)
print(a)
#['', '', '', '', 'x', '', 'x', '', '']

*表答的是前面的元素出现0次或者多次所以

"?"的用法：

import re
#from re import findsll
s = 'hhhhxyxy'
a = re.findall('h?',s)
print(a)
#['h', 'h', 'h', 'h', '', '', '', '', '']

?是匹配前面的字符一次或者0次

"{ }"的用法

import re
#from re import findsll
s = 'hhhhxyxy'
a = re.findall('h{2}',s)
print(a)
#['hh', 'hh']

可以控制匹配h的数量
举个例子：

import re
#from re import findsll
s = '123hhh456'
a = re.findall('1.{3}',s)
print(a)
#['123h']

匹配的是1后面占位的3个符号

".?","(.?)",".*"的用法：

import re
#from re import findsll
s = 'aaaxxabcdxxxxasdsxxaxxdasxxaaa'
a = re.findall('xx.*xx',s)#贪心的匹配尽可能匹配的长但是数量少
b = re.findall('xx(.*?)xx',s)#不贪心少儿多‘（）’返回的是括号里的
c = re.findall('xx.*?xx',s)#不贪心
print(a)
print(b)
print(c)
#print(a):['xxabcdxxxxasdsxxaxxdasxx']
#print(b)['abcd', 'asds', 'das']
#print(c)['xxabcdxx', 'xxasdsxx', 'xxdasxx']

re.S的用法：

import re
#from re import findsll
s = '''aaaxxabcdxxxxasdsxxaxxdasxxaaaxx
asdxxasdasd'''
a = re.findall('xx(.*?)xx',s)
b = re.findall('xx(.*?)xx',s,re.S)
print(a)
print(b)
#['abcd', 'asds', 'das']
#['abcd', 'asds', 'das', '\nasd']

可以看出re.S将"."的匹配范围扩大到了换行符。原来的"."号是没有换行符的

search的用法：

import re
#from re import findsll
s = 'aaaxxabcdxxxxasdsxxaxxdasxxaaaxx'
a = re.search('xx(.*?)xx',s).group(1)
print(a)
#abcd

search的group（1）表示的是第一个括号里的内容
如果有多个括号贼为多个括号里的内容

findall返回值：

import re
#from re import findsll
s = 'aaaxxabcdxxxxasdsxxaxxdasxxaaaxx'
a = re.findall('xx(.*?)xxxx(.*?)xx',s)
print(a[0][1])
#asds

可以看出来a为一个list而如果每一个list的元素中有多个"()"则每一个list的元素为
一个tuple

sub的用法：

import re
#from re import findsll
s = '123aaa123123xxx123'
a = re.sub('123.*123','hello',s)

print(a)
#hello

sub 返回的是一个字符串，并且是将原文匹配的所有改变之后的字符串
sub是将匹配的部分改变并且拼接起来

import re
#from re import findsll
s = '123aaa123123xxx123'
a = re.sub('123.*?123','hello',s)
b = re.sub('123.*123','hello',s)
print(b)#hello
print(a)#hellohello

由于.是贪婪匹配所以匹配的是整体字符串123aaa123123xxx123 替换成hello
而.?是非贪婪所以匹配出两个并且拼接hellohello

正则表达式