如何理解PHP中preg_match()中的?问号和$matches

PHP的preg_match()函数用于执行一个正则表达式匹配。有时候阅读别人的代码（比如下面的示例代码），发现其参数中出现问号“？”和$matches，如何理解他们的意义和作用？

示例代码：

if (preg_match('/^((http|https):\/\/)?([^\/]+)/i', $category['url'], $matches)) {
     $match_url = $matches[0];
     $url = $match_url.'/';
}

preg_match()函数语法：int preg_match ( string $pattern , string$ subject [, array & $matches [, int$ flags = 0 [, int $offset = 0 ]]] )，参数说明如下：

$pattern: 要搜索的模式，字符串形式。
$subject: 输入字符串。
$matches: 如果提供了参数matches，它将被填充为搜索结果。 $matches[0]将包含完整模式匹配到的文本， $matches[1] 将包含第一个捕获子组匹配到的文本，以此类推。
$flags：flags 可以被设置为PREG_OFFSET_CAPTURE----如果传递了这个标记，对于每一个出现的匹配返回时会附加字符串偏移量(相对于目标字符串的)。注意：这会改变填充到matches参数的数组，使其每个元素成为一个由第0个元素是匹配到的字符串，第1个元素是该匹配字符串在目标字符串subject中的偏移量。
offset: 通常，搜索从目标字符串的开始位置开始。可选参数 offset 用于指定从目标字符串的某个未知开始搜索(单位是字节)。

参数说明中已经提到，$matches将被填充为搜索结果。而正则表达式中的问号?表示前面的字符最多只可以出现一次（0次或1次），^表示匹配输入字符串的开始位置（但在方括号表达式中使用时例外），当该符号在方括号表达式中使用时，表示不接受该方括号表达式中的字符集合。

为便于理解和直观表达，把上例代码稍微改造并给输入字符串几组不同的赋值，看看实际的输出效果。实例代码如下：

<?php
    $cat_url = "http://www.web315.net/catid.html";
    $err_url = "www.web315.nethttp://m.web315.net/catid.html";
    $error_url = "http://http://www.web315http://m.web315.net/catid.html";
    if (preg_match('/^(http:\/\/)?([^t]+)/i', $cat_url, $matches)) $cat_url = $matches;
    if (preg_match('/^(http:\/\/)?([^\/]+)/i', $err_url, $matches)) $err_url = $matches;
    if (preg_match('/^(http:\/\/)?([^\/]+)/i', $error_url, $matches)) $error_url = $matches;
?>
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>TEST</title>
</head>
<body>
<div>
    $cat_url 结构 <?php echo var_dump($cat_url);?><br />
    $cat_url[0] 输出 <?php echo $cat_url[0];?><br />
    $cat_url[1] 输出 <?php echo $cat_url[1];?><br />
    $cat_url[2] 输出 <?php echo $cat_url[2];?><br />
    $err_url 结构 <?php echo var_dump($err_url);?><br />
    $err_url[0] 输出 <?php echo $err_url[0];?><br />
    $err_url[1] 输出 <?php echo $err_url[1];?><br />
    $err_url[2] 输出 <?php echo $err_url[2];?><br />
    $error_url 结构 <?php echo var_dump($error_url);?><br />
    $error_url[0] 输出 <?php echo $error_url[0];?><br />
    $error_url[1] 输出 <?php echo $error_url[1];?><br />
    $error_url[2] 输出 <?php echo $error_url[2];?>
</div>
</body>
</html>

前端访问php文件，输出以下信息：

$cat_url 结构 array(3) { [0]=> string(20) "http://www.web315.ne" [1]=> string(7) "http://" [2]=> string(13) "www.web315.ne" }
$cat_url[0] 输出 http://www.web315.ne
$cat_url[1] 输出 http://
$cat_url[2] 输出 www.web315.ne
$err_url 结构 array(3) { [0]=> string(19) "www.web315.nethttp:" [1]=> string(0) "" [2]=> string(19) "www.web315.nethttp:" }
$err_url[0] 输出 www.web315.nethttp:
$err_url[1] 输出
$err_url[2] 输出 www.web315.nethttp:
$error_url 结构 array(3) { [0]=> string(12) "http://http:" [1]=> string(7) "http://" [2]=> string(5) "http:" }
$error_url[0] 输出 http://http:
$error_url[1] 输出 http://
$error_url[2] 输出 http:

以实例代码第一组cat_url的匹配为例，我的理解是：

$matches[0]为搜索$cat_url中开头包含0次或1次http://（一次以上的http://部分不算），到t结束（不包含t）的部分；
$matches[1]为第一个子组（本实例中为http://）匹配到的文本，即第一个子组本身，如果没有匹配就为空；
$matches[2]匹配的是从第一个子组结束开始，到第二个子组t结束（不包含t）的部分。

所以就不难理解，实例代码中$err_url[1]输出为空，因为没有匹配到第一个子组；$error_url[0]输出为http://http:，因为最多只匹配一次http://的开头，到第二个http://的时候已经在匹配第二个子组了（遇到/符号就停止了）。

原文链接：http://www.web315.net/doc/57.html