Swift 4 正则表达式的使用以及案例构造

正则表达式是一个强大的匹配功能,支持 C、python 等多种语言,新兴时尚的 Swift,当然也少不了它。学习完本教程,您将感受到正则表达式赋予程序使用者的强大能力。

本教程首先介绍了 Swift 中各种匹配模式的使用,辅之以各色实例;然后讲解 NSRegularExpression,即我们所要使用的苹果提供的类;最后用一个比较复杂的实例挽总。本教程内容不光涉及正则表达式,也包括错误处理、闭包使用、文档读取与写入等,如有疏漏乃至谬误,请读者不吝赐教。

Part One —— Swift 正则表达式

正则表达式说来也很简单:给定一个 pattern (匹配模式,String 类型),看被检测的对象 String 是否满足这个 pattern,如果满足了,你可以获得对应的部分。

例如:apple是一个 pattern,它能够匹配 apple treeI love apples.这样的 String,获得的结果都是 apple

除此之外,正则表达式支持特定符号代表的省略的值,例如:d.g可以匹配dogdigdag等等 String,这就让正则的功能变得强大起来。

这些 pattern 有一套自己的规则,该规则是一般的语言所通用的,不同语言可能有部分微调。pattern 包括普通字符(例如,a 到 z 之间的字母)和特殊字符(称为”元字符”)。下表列出了所有 Swift 下的元字符(metacharacters)中的字符表达式,来自官方文档。

字符表达式 描述 注释
\a Match a BELL, \u0007
\A Match at the beginning of the input. Differs from ^ in that \A will not match after a new line within the input. 始终匹配输入的开端,不会 因为类型为 anchorsMatchLines 而改变,这是与^不同的地方。
\b, outside of a [Set] Match if the current position is a word boundary. Boundaries occur at the transitions between word (\w) and non-word (\W) characters, with combining marks ignored. 连字符不是字符边界
\b, within a [Set] Match a BACKSPACE, \u0008. 退格键
\B Match if the current position is not a word boundary.
\cX Match a control-X character
\d Match any character with the Unicode General Category of Nd (Number, Decimal Digit.) 匹配数字,包括 Unicode 中的各种数字写法。
\D Match any character that is not a decimal digit.
\e Match an ESCAPE, \u001B.
\E Terminates a \Q ... \E quoted sequence.
\f Match a FORM FEED, \u000C. 换页符
\G Match if the current position is at the end of the previous match.
\n Match a LINE FEED, \u000A. 换行符
\N{UNICODE CHARACTER NAME} Match the named character.
\p{UNICODE PROPERTY NAME} Match any character with the specified Unicode Property. 所有的 Unicode Property 可以点击查看
\P{UNICODE PROPERTY NAME} Match any character not having the specified Unicode Property.
\Q Quotes all following characters until \E.
\r Match a CARRIAGE RETURN, \u000D. 回车键
\s Match a white space character. White space is defined as [\t\n\f\r\p{Z}]. p{Z}包括 Unicode 行分隔、段落分隔、空格等,点击查看
\S Match a non-white space character.
\t Match a HORIZONTAL TABULATION, \u0009. 水平制表
\uhhhh Match the character with the hex value hhhh.
\Uhhhhhhhh Match the character with the hex value hhhhhhhh. Exactly eight hex digits must be provided, even though the largest Unicode code point is \U0010ffff. 必须提供32位的 Unicode
\w Match a word character. Word characters are [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}].
\W Match a non-word character.
\x{hhhh} Match the character with hex value hhhh. From one to six hex digits may be supplied.
\xhh Match the character with two digit hex value hh.
\X Match a Grapheme Cluster. 字形簇
\Z Match if the current position is at the end of input, but before the final line terminator, if one exists.
\z Match if the current position is at the end of input.
\n Back Reference. Match whatever the nth capturing group matched. n must be a number ≥ 1 and ≤ total number of capture groups in the pattern. n 是一个数字,对应着第几个子表达式
\0ooo Match an Octal character. ooo is from one to three octal digits. 0377 is the largest allowed Octal character. The leading zero is required; it distinguishes Octal constants from back references.
[pattern] Match any one character from the pattern. 中括号代表只匹配其中之一
. Match any character. 如果类型为 dotMatchesLineSeparators,则可以匹配换行符,否则不能匹配
^ Match at the beginning of a line.
$ Match at the end of a line.
\ Quotes the following character. Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \ . /

下表列出了所有 Swift 下的元字符中的运算符。

运算符 描述 注释
| Alternation. A|B matches either A or B.
* Match 0 or more times. Match as many times as possible.
+ Match 1 or more times. Match as many times as possible.
? Match zero or one times. Prefer one.
{n} Match exactly n times.
{n,} Match at least n times. Match as many times as possible.
{n,m} Match between n and m times. Match as many times as possible, but not more than m.
*? Match 0 or more times. Match as few times as possible.
+? Match 1 or more times. Match as few times as possible.
?? Match zero or one times. Prefer zero.
{n}? Match exactly n times.
{n,}? Match at least n times, but no more than required for an overall pattern match.
{n,m}? Match between n and m times. Match as few times as possible, but not less than n.
*+ Match 0 or more times. Match as many times as possible when first encountered, do not retry with fewer even if overall match fails (Possessive Match).
++ Match 1 or more times. Possessive match.
?+ Match zero or one times. Possessive match.
{n}+ Match exactly n times.
{n,}+ Match at least n times. Possessive Match.
{n,m}+ Match between n and m times. Possessive Match.
(...) Capturing parentheses. Range of input that matched the parenthesized subexpression is available after the match.
(?:...) Non-capturing parentheses. Groups the included pattern, but does not provide capturing of matching text. Somewhat more efficient than capturing parentheses.
(?>...) Atomic-match parentheses. First match of the parenthesized subexpression is the only one tried; if it does not lead to an overall pattern match, back up the search for a match to a position before the "(?>"
(?# ... ) Free-format comment (?# comment ).
(?= ... ) Look-ahead assertion. True if the parenthesized pattern matches at the current input position, but does not advance the input position.
(?! ... ) Negative look-ahead assertion. True if the parenthesized pattern does not match at the current input position. Does not advance the input position.
(?<= ... ) Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
(?<! ... ) Negative Look-behind assertion. True if the parenthesized pattern does not match text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
(?ismwx-ismwx:... ) Flag settings. Evaluate the parenthesized expression with the specified flags enabled or -disabled. The flags are defined in Flag Options.
(?ismwx-ismwx) Flag settings. Change the flag settings. Changes apply to the portion of the pattern following the setting. For example, (?i) changes to a case insensitive match.The flags are defined in Flag Options.

如果不想为了英语文档而伤脑筋,推荐查看菜鸟教程之正则表达式来入门,但如果要更好的学习 Swift 正则,官方的文档需要参考。

Part Two —— NSRegularExpression 类

不如用一个实例来说明。现在给出一个 String

let sentence = "I'd like to follow my fellow to the fallow to see a hallow harrow."

do {
    // [a-z] 表明该字母可以是a-z中的任意一个
    let regex = try NSRegularExpression(pattern: "f[a-z]llow", options: [])
    
    // matches 的类型是 NSTextCheckingResult 的数组
    let matches = regex.matches(in: sentence, options: [], range: NSRange(location: 0, length: sentence.count))
    print("\(matches.count) matches.")
    
} catch {
    print(error.localizedDescription)
}

结果如下:

3 matches.

而如何获得 matches 中的具体匹配上的字符串呢?调用 NSTextCheckingResult 的 range 属性,将这一范围还原到原来的 sentence 中就可以了。

...
let matches = ...
print(...)
for (i, match) in matches.enumerated() {
        let substring = (sentence as NSString).substring(with: match.range)
        print("\(i) is " + substring + ".")
}
...

结果如下:

3 matches.
0 is follow.
1 is fellow.
2 is fallow.

还可以使用闭包来进行遍历:

// 直接对每一个 match 进行处理
regex.enumerateMatches(in: sentence, options: [], range: NSRange(location: 0, length: sentence.count), using: { result, _, _ in
        guard let result = result else { return }
        let substring = (sentence as NSString).substring(with: result.range)
        print(substring)
})

结果如下:

follow
fellow
fallow
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,547评论 6 477
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,399评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,428评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,599评论 1 274
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,612评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,577评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,941评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,603评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,852评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,605评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,693评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,375评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,955评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,936评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,172评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 43,970评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,414评论 2 342

推荐阅读更多精彩内容

  • 一、正则表达式的用途(搜索和替换) 1.1.正则表达式(regular expression,简称regex)是一...
    IIronMan阅读 10,099评论 0 14
  • 1、通过CocoaPods安装项目名称项目信息 AFNetworking网络请求组件 FMDB本地数据库组件 SD...
    阳明先生x阅读 15,967评论 3 119
  • python的re模块--细说正则表达式 可能是东半球最详细最全面的re教程,翻译自官方文档,因为官方文档写的是真...
    立而人阅读 22,825评论 4 46
  • iOS中使用正则表达式就不得不提NSRegularExpression,所以我们需要先搞清楚什么是NSRegula...
    sunmumu1222阅读 2,329评论 0 4
  • 我为什么喜欢创业,因为我渴望财富,渴望掌控。没有什么是比赚钱和运筹帷幄更让我感到亢奋的了。 这...
    成都三味民宿阅读 819评论 0 5