字符串查找通常有四种方式,暴力查找,KMP查找,BoyerMoore查找以及RabinKarp算法查找,查找最简单的莫过于暴力查找,如果内容是“FlyElephant”,需要查找的内容是“Elephant”,上面的四种方式会给出不同的方案~
暴力查找
最简单的暴力查找:
+(BOOL)simpleMatchPattern:(NSString *)text pattern:(NSString *)pattern{
NSInteger cnlen=text.length;
NSInteger ptlen=pattern.length;
for (NSInteger i=0; i<=cnlen-ptlen; i++) {
NSInteger j;
for (j=0; j<ptlen; j++) {
unichar tChar=[text characterAtIndex:i+j];
unichar pChar=[pattern characterAtIndex:j];
if (tChar!=pChar) {
break;
}
}
if (j==ptlen) {
return YES;
}
}
return NO;
}
遍历内容的长度,获取索引m,然后索引m...m+匹配长度,逐一比较,这个算法可以稍微改进一下:
+(NSInteger)simpleBackMatchPattern:(NSString *)text pattern:(NSString *)pattern{
NSInteger i,j;
NSInteger cnlen=text.length;
NSInteger ptlen=pattern.length;
for (i=0,j=0; i<cnlen && j<ptlen; i++) {
unichar tChar=[text characterAtIndex:i];
unichar pChar=[pattern characterAtIndex:j];
if (tChar==pChar) {
j++;
}else{
i=i-j;
j=0;
}
}
if (j==ptlen) {
return i-ptlen;
}else{
return 0;
}
}
KMP算法
KMP是knuth,Morris和Pratt发明的算法,暴力查找的每次只是循环是固定的加1,假设内容中只有A,B两种你字符,假设查找的字符串是BAAAABAAB,模式字符串为BAAB,我们发现前两个匹配,第三个不匹配,暴力中,我们的循环会从索引1开始继续比较,但是匹配我们知道其他的全是A,所以我们应该从第二个字母B开始匹配,减少回退的成本,实现代码:
static const NSInteger LetterCount=256;
@interface KMPSearch()
@property (strong,nonatomic) NSString *pattern;
@property (strong,nonatomic) NSMutableArray *dfa;
@end
@implementation KMPSearch
-(instancetype)initWithPattern:(NSString *)pattern{
self=[super init];
if (self) {
self.pattern=pattern;
[self setupData];
}
return self;
}
-(NSInteger)search:(NSString *)text{
NSInteger i,j;
NSInteger plen=self.pattern.length;
NSInteger tlen=text.length;
for (i=0,j=0; i<tlen & j<plen; i++) {
unichar uc=[text characterAtIndex:i];
NSArray *arr=self.dfa[uc];
j=[arr[j] integerValue];
}
if (j==plen) {
return i-plen;
}else{
return -1;
}
}
-(void)setupData{
NSInteger plen=self.pattern.length;
self.dfa=[[NSMutableArray alloc]initWithCapacity:LetterCount];
for (NSInteger i=0; i<LetterCount; i++) {
NSMutableArray *arr=[[NSMutableArray alloc]init];
for (NSInteger m=0; m<plen; m++) {
[arr addObject:@(0)];
}
[self.dfa addObject:arr];
}
NSInteger index=[self.pattern characterAtIndex:0];
self.dfa[index][0]=@(1);
for (NSInteger X=0,j=1; j<plen;j++) {
for (NSInteger c=0; c<LetterCount; c++) {
self.dfa[c][j]=self.dfa[c][X];
}
NSInteger uc=[self.pattern characterAtIndex:j];
self.dfa[uc][j]=@(j+1);
X=[self.dfa[uc][X] integerValue];
}
}
@end
BoyerMooer算法
之前的暴力查找和KMP都是从左向右扫描文本对比实现的,但是BoyerMoore是从右向组走扫描实现的,具体实现:
static const NSInteger LetterCount=256;
@interface BoyerMoore()
@property (strong,nonatomic) NSString *pattern;
@property (strong,nonatomic) NSMutableArray *rightArr;
@end
@implementation BoyerMoore
-(instancetype)initWithPattern:(NSString *)pattern{
self=[super init];
if (self) {
self.pattern=pattern;
[self setup];
}
return self;
}
-(NSInteger)search:(NSString *)text{
NSInteger tlen=text.length;
NSInteger plen=self.pattern.length;
NSInteger skip;
for (NSInteger i=0; i<=tlen-plen; i+=skip) {
skip=0;
for (NSInteger j=plen-1;j>=0;j=j-1) {
if ([self.pattern characterAtIndex:j]!=[text characterAtIndex:(i+j)]) {
NSInteger index=[text characterAtIndex:(i+j)];
skip=j-[self.rightArr[index] integerValue];
break;
}
}
if (skip==0) {
return i;//匹配成功
}
}
return -1;//匹配失败
}
-(void)setup{
self.rightArr=[[NSMutableArray alloc]init];
for (NSInteger i=0; i<LetterCount; i++) {
[self.rightArr addObject:@(-1)];
}
for (NSInteger j=0; j<self.pattern.length; j++) {
NSInteger index=[self.pattern characterAtIndex:j];
self.rightArr[index]=@(j);
}
}
@end
Rabin-Karp指纹字符串查找
Rabin和Karp这种算法和前三种走了完全的不同的方向,基于散列的查找,也就是每一个字符串都有一个对应的Hash值,如果Hash值相同的话,那么匹配成功,基本效率保持在O(n).
static const NSInteger Q=997;
static const NSInteger LetterCount=10;
@interface RabinKarp()
@property (strong,nonatomic) NSString *pattern;
@property (assign,nonatomic) NSInteger pCount;
@property (assign,nonatomic) NSInteger RM;
@property (assign,nonatomic) NSInteger pathHash;
@end
@implementation RabinKarp
-(instancetype)initWithPattern:(NSString *)pattern{
self=[super init];
if (self) {
self.pattern=pattern;
self.pCount=self.pattern.length;
[self setup];
}
return self;
}
-(NSInteger)search:(NSString *)text{
NSInteger tlen=text.length;
NSInteger txtHash=[self hash:text length:self.pCount];
//开头匹配成功
if (self.pathHash==txtHash) {
return 0;
}
for (NSInteger j=self.pCount; j<tlen; j++) {
//逐一匹配
NSInteger chValue=[[text substringWithRange:NSMakeRange(j-self.pCount, 1)] integerValue];
txtHash=(txtHash+Q-self.RM*chValue%Q)%Q;
NSInteger num=[[text substringWithRange:NSMakeRange(j, 1)] integerValue];
txtHash=(txtHash*LetterCount+num)%Q;
if (self.pathHash==txtHash) {
return j-self.pCount+1;
}
}
return -1;
}
-(void)setup{
self.RM=1;
for (NSInteger i=1; i<=self.pCount-1; i++) {
self.RM=(LetterCount*self.RM)%Q;
}
self.pathHash=[self hash:self.pattern length:self.pCount];
}
-(NSInteger)hash:(NSString *)key length:(NSInteger)len{
NSInteger hash=0;
for (NSInteger j=0; j<len; j++) {
NSInteger chValue=[[key substringWithRange:NSMakeRange(j, 1)] integerValue];
hash=(LetterCount*hash+chValue)%Q;
}
return hash;
}
-(NSInteger)hashStr:(NSString *)key length:(NSInteger)len{
NSInteger hash=0;
for (NSInteger j=0; j<len; j++) {
NSInteger chValue=[key characterAtIndex:j];
hash=(LetterCount*hash+chValue)%Q;
}
return hash;
}
@end
测试代码:
//1.暴力查找
NSString *text=@"She is a girl";
NSString *pattern=@"girl";
if ([SearchMatch simpleMatchPattern:text pattern:pattern]) {
NSLog(@"%@ is contains %@",text,pattern);
}else{
NSLog(@"%@ is not contains %@",text,pattern);
}
//2.简单回退
text=@"FlyElephant";
pattern=@"Elem";
NSInteger matchIndex=[SearchMatch simpleBackMatchPattern:text pattern:pattern];
if (matchIndex>0) {
NSLog(@"%@ contains %@ ,index %ld",text,pattern,matchIndex);
}else{
NSLog(@"%@ is not contains %@",text,pattern);
}
//3.KMP查找
NSString *kmpText=@"BCBAABACAABABACAA";
NSString *kmpPattern=@"ABABAC";
kmpText=@"AAACCBBDEF";
kmpPattern=@"A";
KMPSearch *kmp=[[KMPSearch alloc]initWithPattern:kmpPattern];
NSInteger result=[kmp search:kmpText];
if (result>=0) {
NSLog(@"%@ contains %@ ,Match Begin index:%ld",kmpText,kmpPattern,result);
}else{
NSLog(@"%@ not contains %@ ",kmpText,kmpPattern);
}
//4.BoyerMoore查找
NSString *bmText=@"FlyElephant";
NSString *bmPattern=@"Fly";
BoyerMoore *bm=[[BoyerMoore alloc]initWithPattern:bmPattern];
NSInteger bmResult=[bm search:bmText];
if (bmResult>=0) {
NSLog(@"%@ contains %@,Begin Index:%ld",bmText,bmPattern,bmResult);
}else{
NSLog(@"%@ not contains %@",bmText,bmPattern);
}
//5.RabinKarp 查找
NSString *rmText=@"3141592653589793";
NSString *rmPattern=@"26535";
RabinKarp *rm=[[RabinKarp alloc]initWithPattern:rmPattern];
NSInteger rmResult=[rm search:rmText];
if (rmResult>=0) {
NSLog(@"%@ contains %@,Begin Index:%ld",rmText,rmPattern,rmResult);
}else{
NSLog(@"%@ not contains %@",rmText,rmPattern);
}