C语言玩耍01 - strtok() 以及 sizeof(str);

今天要玩耍的是strtok()
先上文档。
http://man7.org/linux/man-pages/man3/strtok.3.html
摘一些

#include <string.h>
char *strtok(char *str, const char *delim);

The strtok() function breaks a string into a sequence of zero or more
nonempty tokens. On the first call to strtok(), the string to be
parsed should be specified in str. In each subsequent call that
should parse the same string, str must be NULL.

The delim argument specifies a set of bytes that delimit the tokens
in the parsed string. The caller may specify different strings in
delim in successive calls that parse the same string.

Each call to strtok() returns a pointer to a null-terminated string
containing the next token. This string does not include the
delimiting byte. If no more tokens are found, strtok() returns NULL.

A sequence of calls to strtok() that operate on the same string
maintains a pointer that determines the point from which to start
searching for the next token. The first call to strtok() sets this
pointer to point to the first byte of the string. The start of the
next token is determined by scanning forward for the next
nondelimiter byte in str. If such a byte is found, it is taken as
the start of the next token. If no such byte is found, then there
are no more tokens, and strtok() returns NULL. (A string that is
empty or that contains only delimiters will thus cause strtok() to
return NULL on the first call.)

The end of each token is found by scanning forward until either the
next delimiter byte is found or until the terminating null byte
('\0') is encountered. If a delimiter byte is found, it is
overwritten with a null byte to terminate the current token, and
strtok() saves a pointer to the following byte; that pointer will be
used as the starting point when searching for the next token. In
this case, strtok() returns a pointer to the start of the found
token.

From the above description, it follows that a sequence of two or more
contiguous delimiter bytes in the parsed string is considered to be a
single delimiter, and that delimiter bytes at the start or end of the
string are ignored. Put another way: the tokens returned by strtok()
are always nonempty strings. Thus, for example, given the string
"aaa;;bbb,", successive calls to strtok() that specify the delimiter
string ";," would return the strings "aaa" and "bbb", and then a null
pointer.

我写了一个测试代码

#include <stdio.h>
#include <string.h>

int main() {
  char str[] = "abc,deffff,ghi,jkl";
  char *tok;
  tok = strtok(str, ",");
  printf("1st: %s\n", tok);
  tok = strtok(NULL, ",");
  printf("2nd: %s\n", tok);
  tok = strtok(NULL, ",");
  printf("3rd: %s\n", tok);
  tok = strtok(NULL, ",");
  printf("4th: %s\n", tok);
  printf("str: %s, length: %lu, size: %lu\n", str, strlen(str), sizeof(str));
  return 0;
}

文件名为strtokalt.c

$ make && ./strtokalt
1st: abc
2nd: deffff
3rd: ghi
4th: jkl
str: abc, length: 3, size: 19

首先,strtok() 返回的是C字符串指针。null terminator是在里面的。
然后,第一次使用strtok()的时候,把待tokenize的字符串指针pass进去。
以后再继续tokenize这个字符串的时候,第一个参数一定必须是NULL
不然会很惨的。
你看,前几个token看上去都不错,但我们回过头来看原本的str的时候,就发现它只含有第一个token了。长度为3,大小还是19.
说明什么,说明原本的str在第一个“,”的地方变成了"\0" null terminator.
用gdb检测一下:
$ gdb ./strtoalt

Reading symbols from ./strtoalt...done.
(gdb) l 11
6         char *tok;
7         tok = strtok(str, ",");
8         printf("1st: %s\n", tok);
9         tok = strtok(NULL, ",");
10        printf("2nd: %s\n", tok);
11        tok = strtok(NULL, ",");
12        printf("3rd: %s\n", tok);
13        tok = strtok(NULL, ",");
14        printf("4th: %s\n", tok);
15        printf("str: %s, length: %lu, size: %lu\n", str, strlen(str), sizeof(str));
(gdb) b 7
Breakpoint 1 at 0x8bd: file strtoalt.c, line 7.
(gdb) b 9
Breakpoint 2 at 0x8ec: file strtoalt.c, line 9.
(gdb) b 11
Breakpoint 3 at 0x919: file strtoalt.c, line 11.
(gdb) b 13
Breakpoint 4 at 0x946: file strtoalt.c, line 13.
(gdb) b 16
Breakpoint 5 at 0x99f: file strtoalt.c, line 16.
(gdb) r
Starting program: /home/yuyue/Coding/Play/strtoalt 

Breakpoint 1, main () at strtoalt.c:7
7         tok = strtok(str, ",");
(gdb) p str
$1 = "abc,deffff,ghi,jkl"
(gdb) p sizeof(str)
$2 = 19
(gdb) x/19b str
0x7fffffffe200: 97      98      99      44      100     101     102     102
0x7fffffffe208: 102     102     44      103     104     105     44      106
0x7fffffffe210: 107     108     0
(gdb) p tok
$3 = 0x0
(gdb) c
Continuing.
1st: abc

Breakpoint 2, main () at strtoalt.c:9
9         tok = strtok(NULL, ",");
(gdb) p str
$4 = "abc\000deffff,ghi,jkl"
(gdb) x/19b str
0x7fffffffe200: 97      98      99      0       100     101     102     102
0x7fffffffe208: 102     102     44      103     104     105     44      106
0x7fffffffe210: 107     108     0
(gdb) p tok
$5 = 0x7fffffffe200 "abc"
(gdb) p sizeof(tok)
$6 = 8
(gdb) x/8b tok
0x7fffffffe200: 97      98      99      0       100     101     102     102
(gdb) c
Continuing.
2nd: deffff

Breakpoint 3, main () at strtoalt.c:11
11        tok = strtok(NULL, ",");
(gdb) p str
$7 = "abc\000deffff\000ghi,jkl"
(gdb) x/19b str
0x7fffffffe200: 97      98      99      0       100     101     102     102
0x7fffffffe208: 102     102     0       103     104     105     44      106
0x7fffffffe210: 107     108     0
(gdb) p tok
$8 = 0x7fffffffe204 "deffff"
(gdb) p sizeof(tok)
$9 = 8
(gdb) x/8b tok
0x7fffffffe204: 100     101     102     102     102     102     0       103
(gdb) c
Continuing.
3rd: ghi

Breakpoint 4, main () at strtoalt.c:13
13        tok = strtok(NULL, ",");
(gdb) p str
$10 = "abc\000deffff\000ghi\000jkl"
(gdb) x/19b str
0x7fffffffe200: 97      98      99      0       100     101     102     102
0x7fffffffe208: 102     102     0       103     104     105     0       106
0x7fffffffe210: 107     108     0
(gdb) p tok
$11 = 0x7fffffffe20b "ghi"
(gdb) x/8b tok
0x7fffffffe20b: 103     104     105     0       106     107     108     0
(gdb) c
Continuing.
4th: jkl
str: abc, length: 3, size: 19

Breakpoint 5, main () at strtoalt.c:16
16        return 0;
(gdb) p str
$12 = "abc\000deffff\000ghi\000jkl"
(gdb) x/19b str
0x7fffffffe200: 97      98      99      0       100     101     102     102
0x7fffffffe208: 102     102     0       103     104     105     0       106
0x7fffffffe210: 107     108     0
(gdb) p tok
$13 = 0x7fffffffe20f "jkl"
(gdb) x/8b tok
0x7fffffffe20f: 106     107     108     0       -1      -1      127     0
(gdb) c
Continuing.
[Inferior 1 (process 3138) exited normally]
(gdb) q

可以看到,第一次运行strtok()之后,确实在","的地方变成了"\0"。然后以后的每次调用,都会把delim的地方变成null terminator.
从这次的gdb调试中,看到的是除了delim变化了以外,其他的character都好好的。但实际上这个是不能保证的,比如很多情况下,会使用for loop 或者 while loop 来做tokenize操作,调用strtok(),这时候,原来字符串里的字符都变得乱七八糟的了。 这句话我好像没有求证。先不管他了。
还有一个有趣的点是,这里的char *tok;我并没有把它初始化,初始化是在strtok()函数内部完成的。这里它给我分配了8个字节,也确实够用了,但如果第1个Token是3个字节,然后第2个Token是11个字节呢?
下面我就改一下程序:

#include <stdio.h>
#include <string.h>

int main() {
  char str[] = "abc,123456789ab,ghi,jkl";
  char *tok;
  tok = strtok(str, ",");
  printf("tok 1: %s\n", tok);
  for (int i = 0; i < 3; i++) {
    tok = strtok(NULL, ",");
    printf("tok %d: %s\n", i + 2, tok);
  }
  printf("str: %s, length: %lu, size: %lu\n", str, strlen(str), sizeof(str));
  return 0;
}

编译运行的结果:

$ make && ./strtoalt 
cc -o strtoalt strtoalt.c -Wall -lm -pg -g
tok 1: abc
tok 2: 123456789ab
tok 3: ghi
tok 4: jkl
str: abc, length: 3, size: 24

再用gdb简单测试一下:
gdb ./strtoalt

Reading symbols from ./strtoalt...done.
(gdb) l 11
6         char *tok;
7         tok = strtok(str, ",");
8         printf("tok 1: %s\n", tok);
9         for (int i = 0; i < 3; i++) {
10          tok = strtok(NULL, ",");
11          printf("tok %d: %s\n", i + 2, tok);
12        }
13        printf("str: %s, length: %lu, size: %lu\n", str, strlen(str), sizeof(str));
14        return 0;
15      }
(gdb) b 8
Breakpoint 1 at 0x8d8: file strtoalt.c, line 8.
(gdb) b 11
Breakpoint 2 at 0x90e: file strtoalt.c, line 11.
(gdb) r
Starting program: /home/yuyue/Coding/CMPSC311/CMPSC311Play/strtoalt 

Breakpoint 1, main () at strtoalt.c:8
8         printf("tok 1: %s\n", tok);
(gdb) p str
$1 = "abc\000\061\062\063\064\065\066\067\070\071ab,ghi,jkl"
(gdb) p sizeof(str)
$2 = 24
(gdb) x/24b str
0x7fffffffe200: 97      98      99      0       49      50      51      52
0x7fffffffe208: 53      54      55      56      57      97      98      44
0x7fffffffe210: 103     104     105     44      106     107     108     0
(gdb) p tok
$3 = 0x7fffffffe200 "abc"
(gdb) p sizeof(tok)
$4 = 8
(gdb) x/8b tok
0x7fffffffe200: 97      98      99      0       49      50      51      52
(gdb) c
Continuing.
tok 1: abc

Breakpoint 2, main () at strtoalt.c:11
11          printf("tok %d: %s\n", i + 2, tok);
(gdb) p str
$5 = "abc\000\061\062\063\064\065\066\067\070\071ab\000ghi,jkl"
(gdb) x/24b str
0x7fffffffe200: 97      98      99      0       49      50      51      52
0x7fffffffe208: 53      54      55      56      57      97      98      0
0x7fffffffe210: 103     104     105     44      106     107     108     0
(gdb) p sizeof(tok)
$6 = 8
(gdb) p tok
$7 = 0x7fffffffe204 "123456789ab"
(gdb) x/8b tok
0x7fffffffe204: 49      50      51      52      53      54      55      56
(gdb) x/16b tok
0x7fffffffe204: 49      50      51      52      53      54      55      56
0x7fffffffe20c: 57      97      98      0       103     104     105     44
(gdb) quit
A debugging session is active.

        Inferior 1 [process 3190] will be killed.

Quit anyway? (y or n) y

好的好的,算你狠哦!
首先p str输出的结果$1 = "abc\000\061\062\063\064\065\066\067\070\071ab,ghi,jkl" 里面的数字全部用对应的ascii码的8进制表示了。。。
其次,这个char *tok;不管怎么样size都是8个字节。明明都超过8个字节了,还是显示size是8个字节。。。这是gdb的问题还是strtok()的问题?
那就在程序里面printf一下看看吧。

#include <stdio.h>
#include <string.h>

int main() {
  char str[] = "abc,123456789ab,ghi,jkl";
  char *tok;
  tok = strtok(str, ",");
  printf("tok 1: %s, length %lu, size: %lu\n", tok, strlen(tok), sizeof(tok));
  for (int i = 0; i < 3; i++) {
    tok = strtok(NULL, ",");
    printf("tok %d: %s, length %lu, size: %lu\n", i + 2, tok, strlen(tok), sizeof(tok));
  }
  printf("str: %s, length: %lu, size: %lu\n", str, strlen(str), sizeof(str));
  return 0;
}

运行结果:

$ make && ./strtoalt 
cc -o strtoalt strtoalt.c -Wall -lm -pg -g
tok 1: abc, length 3, size: 8
tok 2: 123456789ab, length 11, size: 8
tok 3: ghi, length 3, size: 8
tok 4: jkl, length 3, size: 8
str: abc, length: 3, size: 24

好吧,那就是strtok()的问题,不是gdb的问题。
想想貌似又有可能不是strtok()的问题,而有可能是我的问题。
为什么这说呢,因为strtok()返回的是一个token的pointer,我的char *tok在第一次传给strtok()的时候就已经被初始化了。然后C语言就会记住这个tok的type,然后它的大小是固定的。
下面再试试看,第二个token给它一个新的变量,然后看看还是不是8个字节。

#include <stdio.h>
#include <string.h>

int main() {
  char str[] = "abc,123456789ab,ghi,jkl";
  char *tok, *tok2;
  tok = strtok(str, ",");
  printf("tok 1: %s, length %lu, size: %lu\n", tok, strlen(tok), sizeof(tok));
  for (int i = 0; i < 3; i++) {
    tok2 = strtok(NULL, ",");
    printf("tok %d: %s, length %lu, size: %lu\n", i + 2, tok2, strlen(tok2), sizeof(tok2));
  }
  printf("str: %s, length: %lu, size: %lu\n", str, strlen(str), sizeof(str));
  return 0;
}

$ make && ./strtokalt

$ make && ./strtoalt 
cc -o strtoalt strtoalt.c -Wall -lm -pg -g
tok 1: abc, length 3, size: 8
tok 2: 123456789ab, length 11, size: 8
tok 3: ghi, length 3, size: 8
tok 4: jkl, length 3, size: 8
str: abc, length: 3, size: 24

可以,服气。我没有冤枉它。它还是睁着眼睛说瞎话,说我的123456789ab大小是8。下次用strtok()的时候,千万不要相信sizeof(),要相信strlen()。。。
以上都是在Ubuntu 18.04上运行的,我试一下我的macOS 10.14.6怎么样。

» ./strtoalt
tok 1: abc, length 3, size: 8
tok 2: 123456789ab, length 11, size: 8
tok 3: ghi, length 3, size: 8
tok 4: jkl, length 3, size: 8
str: abc, length: 3, size: 24

嗯一样的。
不急,我还有一个openBSD 6.5试一试。

$ gmake && ./strtokalt
cc -o strtokalt strtokalt.c -Wall -lm -pg -g
tok 1: abc, length 3, size: 8
tok 2: 123456789ab, length 11, size: 8
tok 3: ghi, length 3, size: 8
tok 4: jkl, length 3, size: 8
str: abc, length: 3, size: 24

嗯,一样的。


分割线


等一下,是不是char *tok;这样定义的都是8个字节呢?
写个代码测试一下:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main()
{
    char *str;
    str = malloc(10 * sizeof(str));
    for (int i = 0; i < 9; i++) {
        str[i] = 'a' + i;
    }
    str[9] = '\0';
    printf("str: %s length: %lu size: %lu\n", str, strlen(str), sizeof(str));
    free(str);
    str = NULL;
    return 0;
}

$ make && ./charsizetest
cc -o charsizetest charsizetest.c -Wall -lm -pg -g
str: abcdefghi length: 9 size: 8

啊,果然如此!
看来是我的C语言语法理解出了问题。虽然char *strchar str[]用法上差不多,但是类型还是不太一样的。char *str不管你里面装了多少东西, size永远是8个字节,然后char str[]的size,是初始化的大小。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 217,734评论 6 505
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,931评论 3 394
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,133评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,532评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,585评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,462评论 1 302
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,262评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,153评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,587评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,792评论 3 336
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,919评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,635评论 5 345
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,237评论 3 329
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,855评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,983评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,048评论 3 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,864评论 2 354

推荐阅读更多精彩内容

  • Lua 5.1 参考手册 by Roberto Ierusalimschy, Luiz Henrique de F...
    苏黎九歌阅读 13,793评论 0 38
  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,331评论 0 10
  • pyspark.sql模块 模块上下文 Spark SQL和DataFrames的重要类: pyspark.sql...
    mpro阅读 9,453评论 0 13
  • mean to add the formatted="false" attribute?.[ 46% 47325/...
    ProZoom阅读 2,696评论 0 3
  • 前几天想写个长一点的故事,感觉摊子铺大了,收不了场,人物太多写不过来。 还是继续写写小故事吧……… 冬天到了,天气...
    隆觞荇阅读 167评论 0 0