10. Character Sets

相关链接:
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Strings/Articles/CharacterSets.html#//apple_ref/doc/uid/20000146-BAJBJHCG

An NSCharacterSet object represents a set of Unicode characters. NSString and NSScanner objects use NSCharacterSet objects to group characters together for searching operations, so that they can find any of a particular set of characters during a search.

  • NSCharacterSet对象表示一组Unicode字符。 NSString和NSScanner对象使用NSCharacterSet对象将字符组合在一起以进行搜索操作,以便他们可以在搜索过程中找到任何特定的字符集。

Character Set Basics

  • 字符集基础知识

A character set object represents a set of Unicode characters. Character sets are represented by instances of a class cluster. The cluster’s two public classes, NSCharacterSet and NSMutableCharacterSet, declare the programmatic interface for immutable and mutable character sets, respectively. An immutable character set is defined when it is created and subsequently cannot be changed. A mutable character set can be changed after it’s created.

  • 字符集对象表示一组Unicode字符。 字符集由类集群的实例表示。 集群的两个公共类NSCharacterSet和NSMutableCharacterSet分别声明了不可变和可变字符集的编程接口。 创建不可变字符集时会定义该字符集,并且随后无法更改。 可变字符集在创建后可以更改。

A character set object doesn’t perform any tasks; it simply holds a set of character values to limit operations on strings. The NSString and NSScannerclasses define methods that take NSCharacterSet objects as arguments to find any of several characters. For example, this code excerpt finds the range of the first uppercase letter in myString:.

  • 字符集对象不执行任何任务; 它只是包含一组字符值来限制对字符串的操作。 NSStringNSScanner类定义了将NSCharacterSet对象作为参数来查找多个字符中的任何一个的方法。 例如,此代码摘录查找myString:中第一个大写字母的范围。
NSString *myString = @"some text in an NSString...";
NSCharacterSet *characterSet = [NSCharacterSet uppercaseLetterCharacterSet];
NSRange letterRange = [myString rangeOfCharacterFromSet:characterSet];

After this fragment executes, letterRange.location is equal to the index of the first “N” in “NSString” after rangeOfCharacterFromSet: is invoked. If the first letter of the string were “S”, then letterRange.location would be 0.

  • 执行此片段后,在调用rangeOfCharacterFromSet:之后,letterRange.location等于“NSString”中第一个“N”的索引。 如果字符串的第一个字母是“S”,则letterRange.location将为0。

Creating Character Sets

  • 创建字符集

NSCharacterSet defines class methods that return commonly used character sets, such as letters (uppercase or lowercase), decimal digits, whitespace, and so on. These “standard” character sets are always immutable, even if created by sending a message to NSMutableCharacterSet. See Standard Character Sets and Unicode Definitions for more information on standard character sets.

  • NSCharacterSet定义返回常用字符集的类方法,例如字母(大写或小写),十进制数字,空格等。 即使通过向NSMutableCharacterSet发送消息来创建,这些“标准”字符集也始终是不可变的。 有关标准字符集的更多信息,请参阅标准字符集和Unicode定义。

You can use a standard character set as a starting point for building a custom set by making a mutable copy of it and changing that. (You can also start from scratch by creating a mutable character set with alloc and init and adding characters to it.) For example, this fragment creates a character set containing letters, digits, and basic punctuation:

  • 您可以使用标准字符集作为构建自定义集的起点,方法是创建自定义集并对其进行更改。 (您也可以通过使用alloc和init创建可变字符集并从中添加字符来从头开始。)例如,此片段创建一个包含字母,数字和基本标点符号的字符集:
NSMutableCharacterSet *workingSet = [[NSCharacterSet alphanumericCharacterSet] mutableCopy];
[workingSet addCharactersInString:@";:,."];
NSCharacterSet *finalCharacterSet = [workingSet copy];

To define a custom character set using Unicode code points, use code similar to the following fragment (which creates a character set including the form feed and line separator characters):

  • 要使用Unicode代码点定义自定义字符集,请使用类似于以下片段的代码(它创建包含换页符和行分隔符字符的字符集):
UniChar chars[] = {0x000C, 0x2028};
NSString *string = [[NSString alloc] initWithCharacters:chars
                            length:sizeof(chars) / sizeof(UniChar)];
NSCharacterSet *characterSet = [NSCharacterSet characterSetWithCharactersInString:string];

Performance considerations

  • 性能考虑因素

Because character sets often participate in performance-critical code, you should be aware of the aspects of their use that can affect the performance of your application. Mutable character sets are generally much more expensive than immutable character sets. They consume more memory and are costly to invert (an operation often performed in scanning a string). Because of this, you should follow these guidelines:
由于字符集通常参与性能关键代码,因此您应该了解它们的使用方面可能会影响应用程序的性能。 可变字符集通常比不可变字符集贵得多。 它们消耗更多内存并且反转成本很高(通常在扫描字符串时执行操作)。 因此,您应遵循以下准则:

  • Create as few mutable character sets as possible

    • 创建尽可能少的可变字符集。
  • Cache character sets (in a global dictionary, perhaps) instead of continually recreating them.

    • 缓存字符集(可能是在全局字典中)而不是不断地重新创建它们。
  • When creating a custom set that doesn’t need to change after creation, make an immutable copy of the final character set for actual use, and dispose of the working mutable character set. Alternatively, create a character set file as described in Creating a character set file and store it in your application’s main bundle.

    • 创建创建后不需要更改的自定义集时,请为实际使用创建最终字符集的不可变副本,并处理可工作的可变字符集。 或者,创建一个字符集文件,如创建字符集文件中所述,并将其存储在应用程序的主包中。
  • Similarly, avoid archiving character set objects; store them in character set files instead. Archiving can result in a character set being duplicated in different archive files, resulting in wasted disk space and duplicates in memory for each separate archive read.

    • 同样,避免存档字符集对象; 将它们存储在字符集文件中。 归档可能导致字符集在不同的归档文件中重复,从而导致每个单独的归档读取浪费的磁盘空间和内存中的重复。

Creating a character set file

  • 创建字符集文件

If your application frequently uses a custom character set, you should save its definition in a resource file and load that instead of explicitly adding individual characters each time you need to create the set. You can save a character set by getting its bitmap representation (an NSData object) and saving that object to a file:

  • 如果您的应用程序经常使用自定义字符集,则应将其定义保存在资源文件中并加载,而不是在每次需要创建集时显式添加单个字符。 您可以通过获取其位图表示(NSData对象)并将该对象保存到文件来保存字符集:
NSData *charSetRep = [finalCharacterSet bitmapRepresentation];
NSURL *dataURL = <#URL for character set#>;
NSError *error;
BOOL result = [charSetRep writeToURL:dataURL options:NSDataWritingAtomic error:&error];

By convention, character set filenames use the extension .bitmap. If you intend for others to use your character set files, you should follow this convention. To read a character set file with a .bitmap extension, simply use the characterSetWithContentsOfFile: method.

  • 按照惯例,字符集文件名使用扩展名.bitmap。 如果您打算让其他人使用您的字符集文件,则应遵循此约定。 要读取带有.bitmap扩展名的字符集文件,只需使用characterSetWithContentsOfFile:方法。

Standard Character Sets and Unicode Definitions

  • 标准字符集和Unicode定义

The standard character sets, such as that returned by letterCharacterSet, are formally defined in terms of the normative and informative categories established by the Unicode standard, such as Uppercase Letter, Combining Mark, and so on. The formal definition of a standard character set is in most cases given as one or more of the categories defined in the standard. For example, the set returned by lowercaseLetterCharacterSet include all characters in normative category Lowercase Letters, while the set returned by letterCharacterSet includes the characters in all of the Letter categories.

  • 标准字符集(例如letterCharacterSet返回的字符集)是根据Unicode标准建立的规范和信息类别正式定义的,例如大写字母,组合标记等。 在大多数情况下,标准字符集的正式定义是作为标准中定义的一个或多个类别给出的。 例如,lowercaseLetterCharacterSet返回的集合包括标准类别Lowercase Letters中的所有字符,而letterCharacterSet返回的集合包括所有Letter类别中的字符。

Note that the definitions of the categories themselves may change with new versions of the Unicode standard. You can download the files that define category membership from http://www.unicode.org/.

  • 请注意,类别本身的定义可能会随着Unicode标准的新版本而改变。 您可以从http://www.unicode.org/下载定义类别成员资格的文件。
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,402评论 6 499
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,377评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,483评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,165评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,176评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,146评论 1 297
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,032评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,896评论 0 274
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,311评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,536评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,696评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,413评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,008评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,659评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,815评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,698评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,592评论 2 353

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,322评论 0 10
  • 刘双鹤,焦点解决讲师一期班,平顶山,成长分享第651天(2017.10.16星期一) 读书 最近读书挺在状态,就是...
    在路上_1阅读 137评论 0 0
  • 应当是发生在我读小学五年级的时候的事,因为我是站在新落成的教学楼走廊里开始见证“大圣”来了的故事的。 时间要...
    阿毛杂货铺阅读 636评论 1 11
  • 7.10读书打卡第14天《格林童话》之《穷磨工和小花猫》经受住考验的人总是能够笑到最后。当贫穷的汉斯面对各种金银的...
    夜空中最亮的星Z阅读 308评论 0 1
  • 复盘:3月3周计划:守心,以完成为第一要务 一、做事 工作: 1、 平台内容核心任务,完成关键的谈判,变成全体团队...
    龙道白阅读 113评论 0 0