一、正则表达式
引例:
import re
# 从字符串中提取出手机号和邮箱
str1 = '你好,我是xxx我的手机号码是:12345678901,邮箱-12345634@qq.com...'
phone = re.findall(r'\d{11}', str1)
print(phone)
email = re.findall(r'\d+@\w+\.\w+', str1)
print(email)
0.re.fullmatch
fullmatch(正则表达式, 字符串)
—— 查看字符串和正则表达式是否匹配,如果不匹配结果是None
正则表达式:r'正则语法'
1.什么是正则表达式
正则表达式是处理字符串的工具,通过不同的正则符号来描述字符串的规则
2.正则符号(正则表达式的语法)
匹配符号
1)普通字符:除了在正则中有特殊功能和意义的符号以外的字符都是普通字符
- 普通字符代表字符本身
from re import fullmatch
# 匹配一个字符串有三个字符,分别是'a', 'b', 'c'
re_str = r'abc'
result = fullmatch(re_str, 'abc')
print(result)
result = fullmatch(re_str, 'ab')
print(result)
result = fullmatch(re_str, 'abcd')
print(result)
2).
—— 匹配任意一个字符
- 一个
.
只能匹配一个字符
from re import fullmatch
# 匹配一个字符串长度是8,前三个字符是abc,后三个字符是123,中间是两个任意字符
re_str = r'abc..123'
result = fullmatch(re_str, 'abcdd123')
print(result)
3)\w
—— 匹配一个数字、字母或者_
(在ASCII码表中)
- 一个
\w
只能匹配一个字符串
from re import fullmatch
# 匹配一个长度是4的字符串,第一个字符是数字、字母或者_,后面三个字符是'abc'
re_str = r'\wabc'
result = fullmatch(re_str, '2abc')
print(result)
4)\d
—— 匹配任意一个数字字符
from re import fullmatch
# 匹配一个长度是5的字符串,前两个字符是任意数字,后面三个是任意字符
re_str = r'\d\d...'
result = fullmatch(re_str, '23has')
print(result)
5)\s
—— 匹配任意一个空白字符
- 空白字符包括:空格字符、换行字符、制表符等等
from re import fullmatch
re_str = r'how\sare!'
result = fullmatch(re_str, 'how are!')
print(result)
result = fullmatch(re_str, 'how\tare!')
print(result)
result = fullmatch(re_str, 'how\nare!')
print(result)
6)\W、\D、\S
\D
—— 匹配除了数字字符以外的任意字符(匹配一个非数字字符)
\S
—— 匹配一个非空白字符
from re import fullmatch
re_str = r'\Dabc\S'
result = fullmatch(re_str, 'aabc2')
print(result)
7)[字符集]
—— 匹配字符集中出现的任意一个字符
注意:一个[]
只能匹配一个字符
a.
[多个字符] - 例如:[abc]
[abc] - 匹配abc中任意一个字符
[a123] - 匹配a123中任意一个字符
[\d\s] - 匹配任意一个数字字符或空白字符
b.
[1-9] - 匹配一个字符是1到9中的任意一个字符(字符编码值递增)
[a-z] - 匹配任意一个小写字母
[A-Z] - 匹配任意一个大写字母
[A-Za-z] - 匹配任意一个字母
[A-Za-z\d_] / [A-Za-z0-9_] - 匹配任意一个字母数字或者下划线
[ \t\n] - 匹配任意一个空白字符
[\u4e00-\u9fa5] - 匹配任意一个中文字符
"""
# 匹配一个长度是4的字符串,第一个字符是1或者3或者7,后面是abc
re_str = r'[137]abc'
result = fullmatch(re_str, '3abc')
print(result)
re_str = r'[!-&]123'
print(fullmatch(re_str, '#123'))
re_str = r'[A-Za-z\d_]123'
print(fullmatch(re_str, 'a123'))
# print(fullmatch(r'[z-a]', 'a'))
# re.error: bad character range z-a at position
8)[^字符集]
—— 匹配不在字符集中的任意一个字符
[^abc] - 匹配任意一个不是a、b、c的字符
[^2-8] - 匹配任意一个不是2到8的字符
[^\u4e00-\u9fa5] - 匹配任意一个非中文的字符
from re import fullmatch
re_str = r'[^2-8]abc'
print(fullmatch(re_str, '8abc'))
print(fullmatch(re_str, '9abc'))
检测符号
所有的检测符号都不会影响字符串的长度
1)\b
—— 检查是否是单词边界
- 单词边界 —— 能够将两个单词隔开并且不会产生歧义的任意符号;空白字符、标点符号、字符串开头和字符串结尾
how are
、how-are
- 匹配规则:先去掉
\b
对字符串进行匹配,如果匹配成功再检查\b
所在的位置是否是单词边界
from re import fullmatch, search
# 匹配一个字符串长度是6,每个字符分别是abc123,并且要求abc和123之间是单词边界
re_str = r'abc\b123'
print(fullmatch(re_str, 'abc 123'))
print(fullmatch(re_str, 'abc123'))
re_str = r'abc,\b123'
print(fullmatch(re_str, 'abc,123'))
re_str = r'abc\b,123'
print(fullmatch(re_str, 'abc,123'))
re_str = r'\b[\u4e00-\u9fa5][\u4e00-9fa5]\b'
print(search(r'\d\d', 'sefcffrr67vrrr45cccfrr'))
print(search(r'\b\d\d\b', 'sefcffrr67vrrr45cccfrr'))
print(search(r'\b\d\d\b', 'sefcffrr67vrrr 45,cccfrr'))
2)^
—— 检查^
所在的位置是否是字符串开头
注意:这儿的^
是[]
外面的
from re import fullmatch, search, findall
re_str = r'^\d\d\d'
print(fullmatch(re_str, '345'))
print(search(re_str, '123'))
print(search(re_str, 's123'))
re_str = r'\d\d\d'
print(fullmatch(re_str, '345'))
print(search(re_str, 'add452dxsf4783z2d1ffe2256f8f5222'))
print(findall(re_str, 'add452dxsf4783z2d1ffe2256f8f5222'))
3)$
—— 检查$
所在的位置是否是字符串结尾
from re import fullmatch, search, findall
re_str = r'\d\d\d$'
print(fullmatch(re_str, '3345'))
print(search(re_str, 'add452dxsf4783z2d1ffe2256f8f5222'))
print(findall(re_str, 'add452dxsf4783z2d1ffe2256f8f5222'))
re_str = r'1[3-9]\d\d\d\d\d\d\d\d\d'
二、正则表达式(续)
匹配次数
控制次数的符号:字符符号
1. *
—— 匹配0次或者多次 {0,}
"""
123a* - 123后面a出现0次或者多次,123,123a,123aaa,...
123\d* - 123后面出现0个或者多个任意数字字符,123,,1230,12312,...
123[mnxy9]* - 123, 123m, 123mn, 123mmn, 123mnxy9, 123999999, ....
"""
import re
re_str = r'abc0*123'
print(re.fullmatch(re_str, 'abc123'))
print(re.fullmatch(re_str, 'abc000123'))
2. +
—— 匹配1次或多次 {1,}
123a+ - 123a, 123aa, 123aaa, ...
import re
re_str = r'123[a-z]+'
print(re.fullmatch(re_str, '123df'))
print(re.fullmatch(re_str, '123d124f14df')) # None
3.?
—— 匹配0次或1次 {0,1}
-?123 - 123 / -123
import re
re_str = r'_?abc'
print(re.fullmatch(re_str, '_abc'))
4.{}
"""
1){N} - 匹配N次
a{3} - 匹配三个a,aaa
\d{3} - 匹配三个任意数字,345,344,000,101
2){M,N} - 匹配M到N次(匹配至少M次最多N次)
3){M,} - 匹配至少M次,123a{3,}: 123aaa, 123aaaaa, ...
4){,N} - 匹配最多N次,123a{,3}: 123, 123a, 123aa, 123aaa
"""
import re
# 匹配电话号码
re_str = r'[3-9]\d{9}'
# 匹配密码:要求是6-12位的数字或者字母
re_str = r'[a-zA-Z0-9]{6,12}'
re_str = r'[a-zA-Z\d]{6,12}'
print(re.fullmatch(r'123a{2,4}', '123aa'))
print(re.fullmatch(r'123a{2,4}', '123a'))
print(re.fullmatch(r'123a{2,4}', '123aaaaa'))
5.贪婪和非贪婪
"""
在匹配次数不确定的时候,会出现贪婪和非贪婪两种情况;默认情况都是贪婪。
什么是贪婪:在能够匹配成功的前提下,匹配次数尽可能多
什么是非贪婪:在能够匹配成功的前提下,匹配次数尽可能少(在匹配次数后面加?)
*?
+?
??
{M,N}?
{M,}?
{,N}?
"""
import re
re_str = r'a.+'
print(re.search(re_str, 'hsjsa==2-32'))
print(re.search(r'a\d{3,5}', '你好a12345690'))
print(re.search(r'a\d{3,5}b', '你好a12345690'))
print(re.search(r'a\d{3,5}', '你好a1234b5690'))
print(re.search(r'a\d{3,5}?', '你好a12345690'))
re_str = r'a.+b'
print(re.search(re_str, 'a123b123b123')) # a123b123b
re_str = r'a.+?b'
print(re.search(re_str, 'a123b123b123')) # a123b
练习:获取data文件中那么对应的值
import re
with open('data', 'r', encoding='utf-8') as f:
content = f.read()
names = re.findall(r'"name":"(.+?)"', content)
print(names)
data文件
{"code":200,"msg":"成功!","data":[{"type":"10","text":"比撒贝宁更“气人”的人出现了!然而…","user_id":"19794323","name":"这又神马情况","screen_name":"这又神马情况","profile_image":"http://wimg.spriteapp.cn/profile/large/2018/09/14/5b9b1bf5d58ed_mini.jpg","created_at":"2018-09-29 22:04:02","create_time":null,"passtime":"2018-09-29 22:04:02","love":"696","hate":"42","comment":"77","repost":"18","bookmark":"24","bimageuri":"","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"55163","theme_name":"主版块","theme_type":"1","videouri":"","videotime":0,"original_pid":"0","cache_version":2,"playcount":null,"playfcount":null,"cai":"42","weixin_url":null,"image1":"http://wimg.spriteapp.cn/ugc/2018/09/27/5bacaaf985b46_1.jpg","image2":"http://wimg.spriteapp.cn/ugc/2018/09/27/5bacaaf985b46_1.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/ugc/2018/09/27/5bacaaf985b46_1.jpg","image_small":null,"cdn_img":"http://wimg.spriteapp.cn/ugc/2018/09/27/5bacaaf985b46_1.jpg","width":"864","height":"12580","tag":"","t":1538229842,"ding":"696","favourite":"24","top_cmt":null,"themes":null},{"type":"10","text":"确定是同一个人?","user_id":"22408767","name":"傻妞也","screen_name":"傻妞也","profile_image":"http://wimg.spriteapp.cn/profile/large/2018/04/21/5adabbcdd36ea_mini.jpg","created_at":"2019-04-02 07:56:01","create_time":null,"passtime":"2019-04-02 07:56:01","love":"270","hate":"18","comment":"40","repost":"0","bookmark":"10","bimageuri":"","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"17083","theme_name":"Gif专区","theme_type":"1","videouri":"","videotime":0,"original_pid":"0","cache_version":2,"playcount":null,"playfcount":null,"cai":"18","weixin_url":null,"image1":"http://wimg.spriteapp.cn/ugc/2019/03/30/5c9efbee3cd27.gif","image2":"http://wimg.spriteapp.cn/ugc/2019/03/30/5c9efbee3cd27.gif","is_gif":false,"image0":"http://wimg.spriteapp.cn/ugc/2019/03/30/5c9efbee3cd27.gif","image_small":null,"cdn_img":"http://wimg.spriteapp.cn/ugc/2019/03/30/5c9efbee3cd27.gif","width":"234","height":"302","tag":"","t":1554162961,"ding":"270","favourite":"10","top_cmt":null,"themes":null},{"type":"41","text":"女孩沉迷直播平台,豪气打赏主播,三天时间花掉二十余万","user_id":"17754071","name":"你一贱我就笑","screen_name":"你一贱我就笑","profile_image":"","created_at":"2019-05-01 12:11:02","create_time":null,"passtime":"2019-05-01 12:11:02","love":"170","hate":"12","comment":"47","repost":"3","bookmark":"1","bimageuri":"http://wimg.spriteapp.cn/picture/2019/0430/0624e69c6af211e9a047842b2b4c75ab_wpd.jpg","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"0","theme_name":"","theme_type":"0","videouri":"http://wvideo.spriteapp.cn/video/2019/0430/0624e69c6af211e9a047842b2b4c75ab_wpd.mp4","videotime":210,"original_pid":"0","cache_version":2,"playcount":"1694","playfcount":"279","cai":"12","weixin_url":null,"image1":"http://wimg.spriteapp.cn/picture/2019/0430/0624e69c6af211e9a047842b2b4c75ab_wpd.jpg","image2":"http://wimg.spriteapp.cn/picture/2019/0430/0624e69c6af211e9a047842b2b4c75ab_wpd.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/picture/2019/0430/0624e69c6af211e9a047842b2b4c75ab_wpd.jpg","image_small":"http://wimg.spriteapp.cn/picture/2019/0430/0624e69c6af211e9a047842b2b4c75ab_wpd.jpg","cdn_img":"http://wimg.spriteapp.cn/picture/2019/0430/0624e69c6af211e9a047842b2b4c75ab_wpd.jpg","width":"1024","height":"576","tag":"","t":1556683862,"ding":"170","favourite":"1","top_cmt":null,"themes":null},{"type":"41","text":"【越哥】16岁少女离奇失踪,爸爸盗取她的社交账号,发现了女儿不为人知的一面!","user_id":"22967450","name":"越哥说电影","screen_name":"越哥说电影","profile_image":"http://wimg.spriteapp.cn/profile/large/2018/10/26/5bd2b60dc081e_mini.jpg","created_at":"2019-01-30 16:38:02","create_time":null,"passtime":"2019-01-30 16:38:02","love":"363","hate":"18","comment":"23","repost":"6","bookmark":"50","bimageuri":"http://wimg.spriteapp.cn/picture/2019/0129/29169555_562.jpg","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"0","theme_name":"","theme_type":"0","videouri":"http://wvideo.spriteapp.cn/video/2019/0129/c6b5943a238d11e982f5842b2b4c75ab_wpd.mp4","videotime":881,"original_pid":"0","cache_version":2,"playcount":"7600","playfcount":"270","cai":"18","weixin_url":null,"image1":"http://wimg.spriteapp.cn/picture/2019/0129/29169555_562.jpg","image2":"http://wimg.spriteapp.cn/picture/2019/0129/29169555_562.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/picture/2019/0129/29169555_562.jpg","image_small":"http://wimg.spriteapp.cn/picture/2019/0129/29169555_562.jpg","cdn_img":"http://wimg.spriteapp.cn/picture/2019/0129/29169555_562.jpg","width":"856","height":"480","tag":"","t":1548837482,"ding":"363","favourite":"50","top_cmt":null,"themes":null},{"type":"41","text":"“我干不动了,这是我最后一笔钱”,靠蹬三轮资助300多名孩子读书。他的名字应该永远被记住:白方礼。","user_id":"22904516","name":"新闻君","screen_name":"新闻君","profile_image":"http://wimg.spriteapp.cn/profile/20180809223631924509.jpg","created_at":"2018-09-23 23:52:02","create_time":null,"passtime":"2018-09-23 23:52:02","love":"2625","hate":"28","comment":"152","repost":"31","bookmark":"69","bimageuri":"http://wimg.spriteapp.cn/picture/2018/0923/cdc8e8f2bef311e8ac97842b2b4c75ab_wpd.jpg","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"0","theme_name":"","theme_type":"0","videouri":"http://wvideo.spriteapp.cn/video/2018/0923/cdc8e8f2bef311e8ac97842b2b4c75ab_wpd.mp4","videotime":49,"original_pid":"0","cache_version":2,"playcount":"17513","playfcount":"3696","cai":"28","weixin_url":null,"image1":"http://wimg.spriteapp.cn/picture/2018/0923/cdc8e8f2bef311e8ac97842b2b4c75ab_wpd.jpg","image2":"http://wimg.spriteapp.cn/picture/2018/0923/cdc8e8f2bef311e8ac97842b2b4c75ab_wpd.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/picture/2018/0923/cdc8e8f2bef311e8ac97842b2b4c75ab_wpd.jpg","image_small":"http://wimg.spriteapp.cn/picture/2018/0923/cdc8e8f2bef311e8ac97842b2b4c75ab_wpd.jpg","cdn_img":"http://wimg.spriteapp.cn/picture/2018/0923/cdc8e8f2bef311e8ac97842b2b4c75ab_wpd.jpg","width":"1066","height":"600","tag":"","t":1537717922,"ding":"2625","favourite":"69","top_cmt":null,"themes":null},{"type":"10","text":"#脑洞超市# 蹦迪…… ","user_id":"22905319","name":"漫画屋","screen_name":"漫画屋","profile_image":"http://wimg.spriteapp.cn/profile/20180810164555663560.jpg","created_at":"2018-11-03 20:00:02","create_time":null,"passtime":"2018-11-03 20:00:02","love":"166","hate":"15","comment":"9","repost":"3","bookmark":"6","bimageuri":"","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"0","theme_name":"","theme_type":"0","videouri":"","videotime":0,"original_pid":"0","cache_version":2,"playcount":null,"playfcount":null,"cai":"15","weixin_url":null,"image1":"http://wimg.spriteapp.cn/ugc/2018/11/02/8fdad196de7811e89129842b2b4c75ab_1.jpg","image2":"http://wimg.spriteapp.cn/ugc/2018/11/02/8fdad196de7811e89129842b2b4c75ab_1.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/ugc/2018/11/02/8fdad196de7811e89129842b2b4c75ab_1.jpg","image_small":null,"cdn_img":"http://wimg.spriteapp.cn/ugc/2018/11/02/8fdad196de7811e89129842b2b4c75ab_1.jpg","width":"690","height":"2816","tag":"","t":1541246402,"ding":"166","favourite":"6","top_cmt":null,"themes":null},{"type":"10","text":"夏天到了,妹子们也开启了新的走路模式啦…","user_id":"20540438","name":"风从这边来","screen_name":"风从这边来","profile_image":"http://wimg.spriteapp.cn/profile/large/2017/04/14/58f0c4db27114_mini.jpg","created_at":"2018-06-23 22:00:02","create_time":null,"passtime":"2018-06-23 22:00:02","love":"360","hate":"46","comment":"139","repost":"5","bookmark":"9","bimageuri":"","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"58240","theme_name":"搞笑图片","theme_type":"1","videouri":"","videotime":0,"original_pid":"0","cache_version":2,"playcount":null,"playfcount":null,"cai":"46","weixin_url":null,"image1":"http://wimg.spriteapp.cn/ugc/2018/06/22/5b2d0a80c8d44.gif","image2":"http://wimg.spriteapp.cn/ugc/2018/06/22/5b2d0a80c8d44.gif","is_gif":false,"image0":"http://wimg.spriteapp.cn/ugc/2018/06/22/5b2d0a80c8d44.gif","image_small":null,"cdn_img":"http://wimg.spriteapp.cn/ugc/2018/06/22/5b2d0a80c8d44.gif","width":"174","height":"250","tag":"","t":1529762402,"ding":"360","favourite":"9","top_cmt":null,"themes":null},{"type":"41","text":"如果长的丑,那你永远体会不到女孩子可以有多主动","user_id":"22870425","name":"拯救不开心","screen_name":"拯救不开心","profile_image":"http://wimg.spriteapp.cn/profile/20180902094459461019.png","created_at":"2019-02-26 08:15:02","create_time":null,"passtime":"2019-02-26 08:15:02","love":"264","hate":"14","comment":"33","repost":"5","bookmark":"11","bimageuri":"http://wimg.spriteapp.cn/picture/2019/0223/a240f726372311e989db842b2b4c75ab_wpd.jpg","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"0","theme_name":"","theme_type":"0","videouri":"http://wvideo.spriteapp.cn/video/2019/0223/a240f726372311e989db842b2b4c75ab_wpd.mp4","videotime":92,"original_pid":"0","cache_version":2,"playcount":"10599","playfcount":"2055","cai":"14","weixin_url":null,"image1":"http://wimg.spriteapp.cn/picture/2019/0223/a240f726372311e989db842b2b4c75ab_wpd.jpg","image2":"http://wimg.spriteapp.cn/picture/2019/0223/a240f726372311e989db842b2b4c75ab_wpd.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/picture/2019/0223/a240f726372311e989db842b2b4c75ab_wpd.jpg","image_small":"http://wimg.spriteapp.cn/picture/2019/0223/a240f726372311e989db842b2b4c75ab_wpd.jpg","cdn_img":"http://wimg.spriteapp.cn/picture/2019/0223/a240f726372311e989db842b2b4c75ab_wpd.jpg","width":"852","height":"480","tag":"","t":1551140102,"ding":"264","favourite":"11","top_cmt":null,"themes":null},{"type":"10","text":"是不是人老了,子女都会不要老人? ","user_id":"22905309","name":"感情连接线","screen_name":"感情连接线","profile_image":"http://wimg.spriteapp.cn/profile/20180810164211596292.jpg","created_at":"2019-02-28 09:18:01","create_time":null,"passtime":"2019-02-28 09:18:01","love":"282","hate":"14","comment":"60","repost":"4","bookmark":"14","bimageuri":"","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"0","theme_name":"","theme_type":"0","videouri":"","videotime":0,"original_pid":"0","cache_version":2,"playcount":null,"playfcount":null,"cai":"14","weixin_url":null,"image1":"http://wimg.spriteapp.cn/ugc/2019/02/27/477072203a6d11e98688842b2b4c75ab_1.jpg","image2":"http://wimg.spriteapp.cn/ugc/2019/02/27/477072203a6d11e98688842b2b4c75ab_1.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/ugc/2019/02/27/477072203a6d11e98688842b2b4c75ab_1.jpg","image_small":null,"cdn_img":"http://wimg.spriteapp.cn/ugc/2019/02/27/477072203a6d11e98688842b2b4c75ab_1.jpg","width":"612","height":"2892","tag":"","t":1551316681,"ding":"282","favourite":"14","top_cmt":null,"themes":null},{"type":"41","text":"【酸辣汤】早起一碗酸辣汤,喝一口汤,既熨帖又开胃!满满的幸福感!","user_id":"20746675","name":"逸羽倾语","screen_name":"逸羽倾语","profile_image":"http://wimg.spriteapp.cn/profile/20170512114807.jpg","created_at":"2018-03-31 09:19:02","create_time":null,"passtime":"2018-03-31 09:19:02","love":"117","hate":"7","comment":"17","repost":"31","bookmark":"120","bimageuri":"http://wimg.spriteapp.cn/picture/2018/0329/27373344_810.jpg","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"0","theme_name":"","theme_type":"0","videouri":"http://wvideo.spriteapp.cn/video/2018/0329/d0cd44f0333711e8ba54842b2b4c75ab_wpd.mp4","videotime":78,"original_pid":"0","cache_version":2,"playcount":"4875","playfcount":"499","cai":"7","weixin_url":null,"image1":"http://wimg.spriteapp.cn/picture/2018/0329/27373344_810.jpg","image2":"http://wimg.spriteapp.cn/picture/2018/0329/27373344_810.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/picture/2018/0329/27373344_810.jpg","image_small":"http://wimg.spriteapp.cn/picture/2018/0329/27373344_810.jpg","cdn_img":"http://wimg.spriteapp.cn/picture/2018/0329/27373344_810.jpg","width":"640","height":"360","tag":"","t":1522459142,"ding":"117","favourite":"120","top_cmt":null,"themes":null},{"type":"41","text":"牛人手里的蔬菜是这样的……吃不起啊","user_id":"17874820","name":"神思化伐","screen_name":"神思化伐","profile_image":"http://wimg.spriteapp.cn/profile/large/2018/01/14/5a5b3dfd34258_mini.jpg","created_at":"2018-04-15 09:57:02","create_time":null,"passtime":"2018-04-15 09:57:02","love":"462","hate":"11","comment":"244","repost":"52","bookmark":"181","bimageuri":"http://wimg.spriteapp.cn/picture/2018/0414/27550686_245.jpg","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"53647","theme_name":"创意脑洞","theme_type":"1","videouri":"http://wvideo.spriteapp.cn/video/2018/0414/5ad2187ce19d9_wpd.mp4","videotime":155,"original_pid":"0","cache_version":2,"playcount":"20077","playfcount":"1290","cai":"11","weixin_url":null,"image1":"http://wimg.spriteapp.cn/picture/2018/0414/27550686_245.jpg","image2":"http://wimg.spriteapp.cn/picture/2018/0414/27550686_245.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/picture/2018/0414/27550686_245.jpg","image_small":"http://wimg.spriteapp.cn/picture/2018/0414/27550686_245.jpg","cdn_img":"http://wimg.spriteapp.cn/picture/2018/0414/27550686_245.jpg","width":"360","height":"632","tag":"","t":1523757422,"ding":"462","favourite":"181","top_cmt":null,"themes":null},{"type":"41","text":"开车的朋友们注意了,这些知识点非常实用!","user_id":"23005972","name":"没有必要","screen_name":"没有必要","profile_image":"http://wimg.spriteapp.cn/profile/large/2019/04/14/5cb33c41c052c_mini.png","created_at":"2019-04-17 17:09:02","create_time":null,"passtime":"2019-04-17 17:09:02","love":"145","hate":"7","comment":"20","repost":"12","bookmark":"27","bimageuri":"http://wimg.spriteapp.cn/picture/2019/0416/29399075_873.jpg","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"55163","theme_name":"主版块","theme_type":"1","videouri":"http://wvideo.spriteapp.cn/video/2019/0416/5a436496-603f-11e9-b1a6-1866daeb0df1_wpd.mp4","videotime":136,"original_pid":"0","cache_version":2,"playcount":"1508","playfcount":"109","cai":"7","weixin_url":null,"image1":"http://wimg.spriteapp.cn/picture/2019/0416/29399075_873.jpg","image2":"http://wimg.spriteapp.cn/picture/2019/0416/29399075_873.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/picture/2019/0416/29399075_873.jpg","image_small":"http://wimg.spriteapp.cn/picture/2019/0416/29399075_873.jpg","cdn_img":"http://wimg.spriteapp.cn/picture/2019/0416/29399075_873.jpg","width":"352","height":"640","tag":"","t":1555492142,"ding":"145","favourite":"27","top_cmt":null,"themes":null},{"type":"41","text":"这么可爱的猫是真实存在的吗!","user_id":"22904236","name":"宠物托儿所","screen_name":"宠物托儿所","profile_image":"http://wimg.spriteapp.cn/profile/20180809181838278927.jpeg","created_at":"2018-12-23 21:44:01","create_time":null,"passtime":"2018-12-23 21:44:01","love":"301","hate":"20","comment":"22","repost":"12","bookmark":"17","bimageuri":"http://wimg.spriteapp.cn/picture/2018/1221/e935cda4050611e9a7ce842b2b4c75ab_wpd.jpg","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"0","theme_name":"","theme_type":"0","videouri":"http://wvideo.spriteapp.cn/video/2018/1221/e935cda4050611e9a7ce842b2b4c75ab_wpd.mp4","videotime":12,"original_pid":"0","cache_version":2,"playcount":"17080","playfcount":"872","cai":"20","weixin_url":null,"image1":"http://wimg.spriteapp.cn/picture/2018/1221/e935cda4050611e9a7ce842b2b4c75ab_wpd.jpg","image2":"http://wimg.spriteapp.cn/picture/2018/1221/e935cda4050611e9a7ce842b2b4c75ab_wpd.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/picture/2018/1221/e935cda4050611e9a7ce842b2b4c75ab_wpd.jpg","image_small":"http://wimg.spriteapp.cn/picture/2018/1221/e935cda4050611e9a7ce842b2b4c75ab_wpd.jpg","cdn_img":"http://wimg.spriteapp.cn/picture/2018/1221/e935cda4050611e9a7ce842b2b4c75ab_wpd.jpg","width":"600","height":"1066","tag":"","t":1545572641,"ding":"301","favourite":"17","top_cmt":null,"themes":null},{"type":"10","text":"大象腿已经不能形容她了,当看到脸的时候,才知道是潜力股……","user_id":"22980232","name":"春色三分","screen_name":"春色三分","profile_image":"http://wimg.spriteapp.cn/profile/large/2019/03/26/5c99f6d0e2979_mini.png","created_at":"2019-06-16 21:20:01","create_time":null,"passtime":"2019-06-16 21:20:01","love":"138","hate":"8","comment":"40","repost":"1","bookmark":"4","bimageuri":"","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"58240","theme_name":"搞笑图片","theme_type":"1","videouri":"","videotime":0,"original_pid":"0","cache_version":2,"playcount":null,"playfcount":null,"cai":"8","weixin_url":null,"image1":"http://wimg.spriteapp.cn/ugc/2019/06/05/5cf791bde255c_1.jpg","image2":"http://wimg.spriteapp.cn/ugc/2019/06/05/5cf791bde255c_1.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/ugc/2019/06/05/5cf791bde255c_1.jpg","image_small":null,"cdn_img":"http://wimg.spriteapp.cn/ugc/2019/06/05/5cf791bde255c_1.jpg","width":"690","height":"3392","tag":"","t":1560691201,"ding":"138","favourite":"4","top_cmt":null,"themes":null},{"type":"10","text":"【100首经典绕口令,喊小伙伴一起练!】9月10日-16日是全国推广普通话宣传周。你的普通话标准吗?分得清平翘舌、前后鼻音吗? 100首经典绕口令,喊上小伙伴一起来自测,你能顺畅读多少段?转起!","user_id":"22905338","name":"知识搬运工","screen_name":"知识搬运工","profile_image":"http://wimg.spriteapp.cn/profile/20180810165442652577.jpg","created_at":"2018-09-16 08:26:01","create_time":null,"passtime":"2018-09-16 08:26:01","love":"126","hate":"13","comment":"22","repost":"25","bookmark":"72","bimageuri":"","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"0","theme_name":"","theme_type":"0","videouri":"","videotime":0,"original_pid":"0","cache_version":2,"playcount":null,"playfcount":null,"cai":"13","weixin_url":null,"image1":"http://wimg.spriteapp.cn/ugc/2018/09/14/87f5436cb7e011e885d3842b2b4c75ab_1.jpg","image2":"http://wimg.spriteapp.cn/ugc/2018/09/14/87f5436cb7e011e885d3842b2b4c75ab_1.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/ugc/2018/09/14/87f5436cb7e011e885d3842b2b4c75ab_1.jpg","image_small":null,"cdn_img":"http://wimg.spriteapp.cn/ugc/2018/09/14/87f5436cb7e011e885d3842b2b4c75ab_1.jpg","width":"820","height":"15568","tag":"","t":1537057561,"ding":"126","favourite":"72","top_cmt":null,"themes":null},{"type":"41","text":"【周末狂欢/活动精选】就喜欢星爷这一段 看一次笑一次","user_id":"18410821","name":"囿团之囚","screen_name":"囿团之囚","profile_image":"http://wimg.spriteapp.cn/profile/large/2018/05/07/5af022e9842de_mini.jpg","created_at":"2018-07-14 21:00:01","create_time":null,"passtime":"2018-07-14 21:00:01","love":"570","hate":"21","comment":"120","repost":"11","bookmark":"37","bimageuri":"http://wimg.spriteapp.cn/picture/2018/0714/5b48ce54972f8_wpd.jpg","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"407","theme_name":"影视分享","theme_type":"1","videouri":"http://wvideo.spriteapp.cn/video/2018/0714/5b48ce54972f8_wpd.mp4","videotime":284,"original_pid":"0","cache_version":2,"playcount":"35290","playfcount":"4854","cai":"21","weixin_url":null,"image1":"http://wimg.spriteapp.cn/picture/2018/0714/5b48ce54972f8_wpd.jpg","image2":"http://wimg.spriteapp.cn/picture/2018/0714/5b48ce54972f8_wpd.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/picture/2018/0714/5b48ce54972f8_wpd.jpg","image_small":"http://wimg.spriteapp.cn/picture/2018/0714/5b48ce54972f8_wpd.jpg","cdn_img":"http://wimg.spriteapp.cn/picture/2018/0714/5b48ce54972f8_wpd.jpg","width":"540","height":"302","tag":"","t":1531573201,"ding":"570","favourite":"37","top_cmt":null,"themes":null},{"type":"41","text":"五一把车洗干净,准备和女朋友自驾游!","user_id":"22784081","name":"大脚板走天涯","screen_name":"大脚板走天涯","profile_image":"http://wimg.spriteapp.cn/profile/large/2018/09/26/5bab8cc61a012_mini.jpg","created_at":"2019-05-03 09:58:02","create_time":null,"passtime":"2019-05-03 09:58:02","love":"174","hate":"10","comment":"38","repost":"3","bookmark":"4","bimageuri":"http://wimg.spriteapp.cn/picture/2019/0430/5cc83760b34c0_wpd.jpg","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"58191","theme_name":"搞笑视频","theme_type":"1","videouri":"http://wvideo.spriteapp.cn/video/2019/0430/5cc83760b34c0_wpd.mp4","videotime":15,"original_pid":"0","cache_version":2,"playcount":"6055","playfcount":"840","cai":"10","weixin_url":null,"image1":"http://wimg.spriteapp.cn/picture/2019/0430/5cc83760b34c0_wpd.jpg","image2":"http://wimg.spriteapp.cn/picture/2019/0430/5cc83760b34c0_wpd.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/picture/2019/0430/5cc83760b34c0_wpd.jpg","image_small":"http://wimg.spriteapp.cn/picture/2019/0430/5cc83760b34c0_wpd.jpg","cdn_img":"http://wimg.spriteapp.cn/picture/2019/0430/5cc83760b34c0_wpd.jpg","width":"496","height":"720","tag":"","t":1556848682,"ding":"174","favourite":"4","top_cmt":null,"themes":null},{"type":"10","text":"胖子减肥后,脸型会发生怎样的变化?","user_id":"23030770","name":"搞笑女孩的日常","screen_name":"搞笑女孩的日常","profile_image":"http://wimg.spriteapp.cn/profile/large/2019/01/26/5c4b9abb76355_mini.jpg","created_at":"2019-01-27 14:40:02","create_time":null,"passtime":"2019-01-27 14:40:02","love":"219","hate":"19","comment":"88","repost":"16","bookmark":"8","bimageuri":"","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"55163","theme_name":"主版块","theme_type":"1","videouri":"","videotime":0,"original_pid":"0","cache_version":2,"playcount":null,"playfcount":null,"cai":"19","weixin_url":null,"image1":"http://wimg.spriteapp.cn/ugc/2019/01/26/5c4c23b19f4e9_1.jpg","image2":"http://wimg.spriteapp.cn/ugc/2019/01/26/5c4c23b19f4e9_1.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/ugc/2019/01/26/5c4c23b19f4e9_1.jpg","image_small":null,"cdn_img":"http://wimg.spriteapp.cn/ugc/2019/01/26/5c4c23b19f4e9_1.jpg","width":"600","height":"8856","tag":"","t":1548571202,"ding":"219","favourite":"8","top_cmt":null,"themes":null},{"type":"10","text":"冯小刚和崔永元的撕逼战,到底TM在说什么?","user_id":"8841843","name":"少年不知苦滋味","screen_name":"少年不知苦滋味","profile_image":"http://wimg.spriteapp.cn/profile/large/2017/10/25/59f06b6905582_mini.jpg","created_at":"2018-07-15 06:36:02","create_time":null,"passtime":"2018-07-15 06:36:02","love":"585","hate":"49","comment":"156","repost":"24","bookmark":"46","bimageuri":"","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"55163","theme_name":"主版块","theme_type":"1","videouri":"","videotime":0,"original_pid":"0","cache_version":2,"playcount":null,"playfcount":null,"cai":"49","weixin_url":null,"image1":"http://wimg.spriteapp.cn/ugc/2018/07/13/5b48c6095b83b_1.jpg","image2":"http://wimg.spriteapp.cn/ugc/2018/07/13/5b48c6095b83b_1.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/ugc/2018/07/13/5b48c6095b83b_1.jpg","image_small":null,"cdn_img":"http://wimg.spriteapp.cn/ugc/2018/07/13/5b48c6095b83b_1.jpg","width":"750","height":"29500","tag":"","t":1531607762,"ding":"585","favourite":"46","top_cmt":null,"themes":null},{"type":"41","text":"千万不要打这3个年龄段的孩子,再生气也别动手!当心毁了孩子!","user_id":"20111037","name":"纯色映像","screen_name":"纯色映像","profile_image":"http://wimg.spriteapp.cn/profile/large/2017/10/14/59e22cba66df9_mini.jpg","created_at":"2018-05-01 09:03:02","create_time":null,"passtime":"2018-05-01 09:03:02","love":"207","hate":"34","comment":"81","repost":"39","bookmark":"25","bimageuri":"http://wimg.spriteapp.cn/picture/2018/0430/933ecac24c0b11e88f6c842b2b4c75ab_wpd.jpg","voiceuri":null,"voicetime":null,"voicelength":null,"status":"4","theme_id":"0","theme_name":"","theme_type":"0","videouri":"http://wvideo.spriteapp.cn/video/2018/0430/933ecac24c0b11e88f6c842b2b4c75ab_wpd.mp4","videotime":90,"original_pid":"0","cache_version":2,"playcount":"8049","playfcount":"428","cai":"34","weixin_url":null,"image1":"http://wimg.spriteapp.cn/picture/2018/0430/933ecac24c0b11e88f6c842b2b4c75ab_wpd.jpg","image2":"http://wimg.spriteapp.cn/picture/2018/0430/933ecac24c0b11e88f6c842b2b4c75ab_wpd.jpg","is_gif":false,"image0":"http://wimg.spriteapp.cn/picture/2018/0430/933ecac24c0b11e88f6c842b2b4c75ab_wpd.jpg","image_small":"http://wimg.spriteapp.cn/picture/2018/0430/933ecac24c0b11e88f6c842b2b4c75ab_wpd.jpg","cdn_img":"http://wimg.spriteapp.cn/picture/2018/0430/933ecac24c0b11e88f6c842b2b4c75ab_wpd.jpg","width":"854","height":"480","tag":"","t":1525136582,"ding":"207","favourite":"25","top_cmt":null,"themes":null}]}
分之
1.|
"""
正则1|正则2 - 先让正则1去匹配,如果匹配成功就成功,匹配失败再让正则2去匹配;
(正则1和正则2中只要有一个能匹配成功就行)
"""
re_str = r'abc|123'
print(re.fullmatch(re_str, 'abc'))
print(re.fullmatch(re_str, '123'))
re_str = r'd{2,5}|[a-z]+123'
print(re.fullmatch(re_str, '23'))
print(re.fullmatch(re_str, 'as123'))
re_str = r'd{2,5}|[a-z]+123|[A-Z]'
print(re.fullmatch(re_str, 'H'))
分组
1.()
—— 将括号内的内容作为一个整体
1)整体操作
r'(\d\d|[A-Z]{2})abc'
—— 匹配一个字符串后面是abc,前面是两个数字或者两个大写字母
r'([a-z]\d){3}'
—— a8n7i8
2)分组
a.分组截取:方便后面分段或者分情况取不同匹配结果
b.分组重复:在正则中用\X
来重复前面第X个分组匹配到的内容
**注意:\X
的前面必须有这个分组
import re
re_str = r'(\d{3})([a-z]{3})'
print(re.findall(re_str, '是嗯908就是112sdd345rrf1数独www你说句话'))
print(re.fullmatch(re_str, '344hsj'))
re_str = r'(\d{3})[a-z]{3}'
print(re.findall(re_str, '是嗯908就是112sdd345rrf1数独www你说句话'))
print(re.fullmatch(re_str, '344hsj'))
# 分组重复
# 234hjk234, 112hjk112
re_str = r'(\d{3})[a-z]{3}\1'
print(re.fullmatch(re_str, '112ssf112'))
练习1:写一个正则表达式匹配一个字符串:abc的前面是两个数字或者两个大写字母
import re
# 23abc, KJabc
re_str = r'(\d{2}|[A-Z]{2})abc'
print(re.fullmatch(re_str, '11abc'))
re_str = r'(\d|[A-Z]){2}abc'
print(re.fullmatch(re_str, '11abc'))
转义字符
1.加\
在正则中有特殊功能和特殊意义的符号前加\
,让这个符号的特殊功能和特殊意义消失
import re
re_str = r'\d{3}\.[a-z]{3}'
print(re.fullmatch(re_str, '234.vbf'))
re_str = r'a\+\d{3}'
print(re.fullmatch(re_str, 'a+234'))
re_str = r'\\d\\b'
print(re.fullmatch(re_str, '\\d\\b'))
re_str = r'\\w'
print(re.fullmatch(re_str, '\\w'))
2.加[]
在[]
中有特殊意义的符号:a. ^
放在最开头 b.-
放在两个字符之间
其他符号包括:. + ? $
,这个单独的符号在[]中都表示这个符号本身
import re
re_str = r'\d{3}[.][a-z]{3}'
print(re.fullmatch(re_str, '123.hsx'))
re_str = r'\d{3}[a+][a-z]{3}'
print(re.fullmatch(re_str, '123+hjs'))
re_str = r'\d{3}[-a+^]][a-z]{3}'
print(re.fullmatch(re_str, '123+]hjs'))
re_str = r'\d{3}[-a+^\]][a-z]{3}'
print(re.fullmatch(re_str, '123+hjs'))
三、re模块
re模块是python提供的,专门针对正则表达式应用的相关函数
1.re.compile(正则表达式)
-> 将正则表达式转换成正则对象
import re
re_obj = re.compile(r'\d{3}')
re.fullmatch(r'\d{3}', '347')
re.findall(r'\d{3}', 'asjhf234ddsee1233')
re_obj.fullmatch('234')
re_obj.fullmatch('asx345cfewr123c23442')
2.字符串匹配
"""
re.fullmatch(正则表达式, 字符串) - 让正则表达式和字符串完全匹配
re.match(正则表达式, 字符串) - 匹配字符串开头
以上两个方法的结果:匹配失败结果是None,匹配成功会返回匹配对象
"""
import re
re_str = r'\d{3}'
print(re.fullmatch(re_str, '342')) # <re.Match object; span=(0, 3), match='342'>
print(re.match(re_str, '234dfrxcr')) # <re.Match object; span=(0, 3), match='234'>
1)匹配对象
import re
result = re.fullmatch(r'\d{3}=[a-z]{2}', '234=am')
print(result)
print(result.group())
print(result.group(0))
result = re.fullmatch(r'(\d{3})=([a-z]{2})', '234=am')
print(result)
print(result.group(1))
print(result.group(2))
a.获取匹配到的字符串
"""
匹配对象.group() - 获取整个正则表达式匹配到的字符串,结果是字符串
匹配对象.group(N) - 获取整个正则表达式中第N个分组匹配到的字符串
"""
b.获取匹配到的字符串在原字符串中的位置信息
"""
匹配对象.span() - 返回匹配结果在原字符串中的下标范围:[开始下标, 结束下标)
匹配对象.span(N) - 返回第N个分组匹配到的结果在原字符串中的范围
"""
import re
result = re.match(r'(\d{3})=([a-z]{2})', '234=am')
print(result)
print(result.span())
print(result.span(1), result.span(2))
start, end = result.span(1)
print(start, end)
c.获取原字符串
匹配对象.string
3.查找
"""
1)re.search(正则表达式, 字符串) - 在字符串中查找第一个满足正则表达式的子串,如果找到了结果是匹配对象,找不到结果是None
2)re.findall(正则表达式, 字符串) - 获取字符串中所有满足正则表达式的字串;返回值是一个列表
注意:如果正则表达式中有分组,列表中的匹配结果只会取分组中匹配到的内容
3)re.finditer(正则表达式, 字符串) - 获取字符串中所有满足正则表达式的字串;返回值是一个迭代器,元素是匹配对象
"""
import re
result = re.search(r'\d{3}', 'sssx2313xdce231')
print(result)
result = re.findall(r'\d{3}[a-z]{2}', '实现234hd就使用234js介绍好好238css')
print(result) # ['234hd', '234js', '238cs']
result = re.findall(r'(\d{3})[a-z]{2}', '实现234hd就使用234js介绍好好238css')
print(result) # ['234', '234', '238']
result = re.findall(r'(\d{3})([a-z]{2})', '实现234hd就使用234js介绍好好238css')
print(result) # [('234', 'hd'), ('234', 'js'), ('238', 'cs')]
result = re.findall(r'(([A-Z]{2}|\+)\d{2})', 'AS54数SW77和核工+86业12四季花海')
print(result)
result = re.finditer(r'([A-Z]{2}|\+)(\d{2})', 'AS54数SW12和核+45工业+12四季LK45花海')
group = []
group1 = []
group2 = []
for x in result:
group.append(x.group())
group1.append(x.group(1))
group2.append(x.group(2))
print(group)
print(group1)
print(group2)
4.切割
re.split(正则表达式, 字符串)
—— 将字符串中满足正则表达式的子串作为切割点进行切割;返回一个字符串列表
import re
result = re.split(r'\d+', '塞内加尔45dffsx书法家和人457644755上赛季和')
print(result)
5.替换
re.sub(正则表达式, 字符串1, 字符串2)
—— 将字符串2中所有满足正则表达式的子串都替换成字符串1;返回一个字符串
import re
message = input('评论:')
new_message = re.sub(r'fuck', '*', message, flags=re.IGNORECASE)
print(new_message)