摘要:看到一本书写的正则表达式教程非常好,特地整理出来,本教程结合linux的grep命令,可以让大家迅速掌握正则表达式。正则在nginx配置和linux命令中应用非常广泛。这个正则教程尽量写的简单,肯定可以看懂,如果碰到一个很繁琐的正则表达式,只要耐心分析肯定可以看懂,因为正则表达式都是一段一段的,不像复杂抽象的程序逻辑。
grep是常用的linux命令,用于字符串数据的对比,将符合条件的字符串打印出来。
1
grep '搜寻字符串' filename
一个栗子:
12
grep 'root' /etc/passwdroot:x:0:0:root:/root:/bin/bash
为了显示突出显示效果也就是高亮效果,可以定义grep别名:
1
grep='grep --color=auto'
范例文件r.txt
在linux可以通过下列命令获取:
123456789101112131415161718192021222324
wget http://linux.vbird.org/linux_basic/0330regularex/regular_express.txtmv regular_express.txt r.txtcat r.txt"Open Source" is a good mechanism to develop programs.apple is my favorite food.Football game is not use feet only.this dress doesn't fit me.However, this dress is about $ 3183 dollars.GNU is free air not free beer.Her hair is very beauty.I can't finish the test.Oh! The soup taste good.motorcycle is cheap than car.This window is clear.the symbol '*' is represented as start.Oh! My god!The gd software is a library for drafting programs.You are the best is mean you are the no. 1.The world <Happy> is the same with "glad".I like dog.google is the best tools for search keyword.goooooogle yes!go! go! Let's go.# I am VBird
这文件一共22行,最后一行是空白行。
基础正则表达式的练习
例一:
1234567
grep -n 'the' r.txt8:I can't finish the test.12:the symbol '*' is represented as start.15:You are the best is mean you are the no. 1.16:The world <Happy> is the same with "glad".18:google is the best tools for search keyword.
例二:用中括号[]来查找
如果想查找test或taste这两个单词,发现它们的共同点是’t?st’。可以这样查找:
1234
grep -n 't[ae]st' r.txt8:I can't finish the test.9:Oh! The soup taste good.
[]不论有几个字符,它都只代表某“一个”字符。如果想查找有oo的字符:
12345678
grep -n 'oo' r.txt1:"Open Source" is a good mechanism to develop programs.2:apple is my favorite food.3:Football game is not use feet only.9:Oh! The soup taste good.18:google is the best tools for search keyword.19:goooooogle yes!
如果不想要oo前面有g的话:
12345
grep -n '[^g]oo' r.txt2:apple is my favorite food.3:Football game is not use feet only.18:google is the best tools for search keyword.19:goooooogle yes!
如果想要oo前面有小写字母:
12
grep -n '[^a-z]oo' r.txt3:Football game is not use feet only.
类似想法还有:[a-z]、[A-Z]、[0-9]、[a-zA-Z0-9]等,例如:
123
grep -n '[0-9]' r.txt5:However, this dress is about $ 3183 dollars.15:You are the best is mean you are the no. 1.
12
grep -n '^the' r.txt12:the symbol '*' is represented as start.
列出行首是小写字母的行:
12345678
grep -n '^[a-z]' r.txt2:apple is my favorite food.4:this dress doesn't fit me.10:motorcycle is cheap than car.12:the symbol '*' is represented as start.18:google is the best tools for search keyword.19:goooooogle yes!20:go! go! Let's go.
如果要列出行首不是英文字母的行:
123
grep -n '[a-zA-Z]' r.txt1:"Open Source" is a good mechanism to develop programs.21:# I am VBird
注意:^符号用在方括号[]里外是不同的。在[]内表示“反向选择”,在[]外则表示定位在行首。要找出结尾是小数点(.)的行:
12345678910111213
grep -n '.$' r.txt1:"Open Source" is a good mechanism to develop programs.2:apple is my favorite food.3:Football game is not use feet only.4:this dress doesn't fit me.10:motorcycle is cheap than car.11:This window is clear.12:the symbol '*' is represented as start.15:You are the best is mean you are the no. 1.16:The world <Happy> is the same with "glad".17:I like dog.18:google is the best tools for search keyword.20:go! go! Let's go.
小数点在正则表达式中有特殊含义(下面讲),需要用反斜线()转义。第5到9行的结尾也是小数点,怎么没有打印出来?用cat -A将5到9行打印出来:
1234567
cat -An r.txt | head -n 10 | tail -n 6 5 However, this dress is about $ 3183 dollars.^M$ 6 GNU is free air not free beer.^M$ 7 Her hair is very beauty.^M$ 8 I can't finish the test.^M$ 9 Oh! The soup taste good.^M$ 10 motorcycle is cheap than car.$
5~9行是windows(DOS)格式的断行字符(^M$),而第10行是linux格式断行字符。通过这个也就理解了为啥用$符号表示行尾。如果想找出空白行:
12
grep -n '^$' r.txt22:
linux的配置文件中有大量以#开始的注释,如果想不显示空行和注释:
1234567
grep -v '^$' /etc/deluser.conf | grep -v '^#'REMOVE_HOME = 0REMOVE_ALL_FILES = 0BACKUP = 0BACKUP_TO = "."ONLY_IF_EMPTY = 0EXCLUDE_FSTYPES = "(proc|sysfs|usbfs|devpts|tmpfs|afs)"
例四:任意一个字符.与重复字符*
.(小数点):表示一定有一个任意字符;*(星号):表示重复前一个字符0到无穷次;假设要找出g??d的字符串:
1234
grep -n 'g..d' r.txt1:"Open Source" is a good mechanism to develop programs.9:Oh! The soup taste good.16:The world <Happy> is the same with "glad".
假如要列出oo,ooo,oooo等数据,需要用到星号。需要注意的是’o‘表示’’,’o’,’oo’,’ooo’等,即空字符也用’o‘表示。而’oo‘,表示’o’,’oo’,’ooo’等,即至少有一个o。同理,想表示至少两个o用’ooo*’:
1234567
grep -n 'ooo*' r.txt1:"Open Source" is a good mechanism to develop programs.2:apple is my favorite food.3:Football game is not use feet only.9:Oh! The soup taste good.18:google is the best tools for search keyword.19:goooooogle yes!
如何查找两个g之间至少一个o,即gog,goog,gooog等:
123
grep -n 'goo*g' r.txt18:google is the best tools for search keyword.19:goooooogle yes!
如果要查找以g开头以g结尾的字符串,是’gg’吗?正确的应是’g.g’:
123456
grep -n 'g.*g' r.txt1:"Open Source" is a good mechanism to develop programs.14:The gd software is a library for drafting programs.18:google is the best tools for search keyword.19:goooooogle yes!20:go! go! Let's go.
如果只留下英文单词,则:
123
grep -n 'g[a-zA-Z]*g' r.txt18:google is the best tools for search keyword.19:goooooogle yes!
如果查找任意数字:
123
grep -n '[0-9][0-9]*' r.txt5:However, this dress is about $ 3183 dollars.15:You are the best is mean you are the no. 1.
例五:限定连续RE字符范围{}
之前,用.和*来设置0个到无限个重复字符,如果需要限定重复次数呢?这需要用到限定范围的字符{}了。由于在shell中{}有特殊含义,需要用反斜线\进行转义。假如要找到两个o的字符串:
1234567
grep -n 'o{2}' r.txt1:"Open Source" is a good mechanism to develop programs.2:apple is my favorite food.3:Football game is not use feet only.9:Oh! The soup taste good.18:google is the best tools for search keyword.19:goooooogle yes!
假设要要查找g后面2到5个o,然后再接一个g的字符串,则:
12
grep -n 'go{2,5}g' r.txt18:google is the best tools for search keyword.
第19行由于有6个o,导致没有被选择上。
基础正则表达式的总结
RE字符
含义
^word
带查找的字符串(word)在行首
word$
带查找的字符串(word)在行尾
.
代表一定有一个任意字符
转义字符
重复0次到无穷次的前一个字符
[list]
列举出想要选取的字符,如’a[al]y’表示可以查找aay,aly。
[n1-n2]
列举出想要选取的字符范围,如’[0-9]’表示十进制数字字符
[^list]
定义不要的字符或范围,如’[^A-Z]’表示不要大写字符
{n,m}
连续n到m个前一个RE字符
扩展正则表达式
grep使用扩展正则表达式要加-E参数或直接使用egrep别名命令。
RE字符
含义
重复1次到无穷次的前一个字符
?
代表0个或1个任意字符
|
用或(or)的方式找出数个字符串.例如,egrep -n ‘gd| good’ r.txt
()
找出”组”字符串。如查找glad或good, egrep -n ‘g(la| oo)d’ r.txt
()+
重复1次到无穷次前面的组。如查找”AxyzxyzxyzxyzC”,echo ‘AxyzxyzxyzxyzC’ | egrep ‘A(xyz)+C’
需要强调的是感叹号!在正则表达式中并不是特殊字符。
以上,希望有帮助