我需要在大型NSString中找到所有关键字(用于解析源代码),而我当前的实现速度太慢,但我不确定如何改进它.
我正在使用NSRegularExpression,基于它比我能编写的任何内容更优化的假设,但性能比我预期的要慢.有谁知道更快的实现方法?
目标字符串将包含utf-8字符,但关键字本身将始终为纯字母数字ascii.我想这可以用来优化一些东西?
@implementation MyClass // i'm storing the regular expression in a static variable,since it never changes and I need to re-use it often static NSRegularExpression *keywordsExpression; + (void)initialize { [super initialize]; NSArray *keywords = [NSArray arrayWithObjects:@"accumsan",@"adipiscing",@"aliquam",@"aliquet",@"amet",@"ante",@"arcu",@"at",@"commodo",@"congue",@"consectetur",@"consequat",@"convallis",@"cras",@"curabitur",@"cursus",@"dapibus",@"diam",@"dolor",@"dui",@"elit",@"enim",@"erat",@"eros",@"est",@"et",@"eu",@"felis",@"fermentum",@"gravida",@"iaculis",@"id",@"imperdiet",@"integer",@"ipsum",@"lacinia",@"lectus",@"leo",nil]; NSString *pattern = [NSString stringWithFormat:@"\\b(%@)\\b",[keywords componentsJoinedByString:@"|"]; // \b(accumsan|adipiscing|aliquam|…)\b keywordsExpression = [NSRegularExpression regularExpressionWithPattern:pattern] options:NSRegularExpressionCaseInsensitive error:NULL]; } // this method will be called in quick succession,I need it to be a able to run tens // of thousands of times per second. The target string is big (50KB or so),but the // search range is short,rarely more than 30 characters - (NSRange)findNextKeyword:(NSString *)string inRange:(NSRange)range { return [keywordsExpression rangeOfFirstMatchInString:string options:0 range:range]; } @end
编辑根据@ CodeBrickie的回答,我已经更新了我的代码,对整个字符串执行一次正则表达式搜索,并将匹配保存到缓存的NSIndexSet,然后每次调用该方法时,它在NSIndexSet中搜索关键字范围而不是搜索字符串.结果大约快一个数量级:
@implementation MyClass static NSRegularExpression *keywordsExpression; static NSIndexSet *keywordIndexes = nil; + (void)initialize { [super initialize]; NSArray *keywords = [NSArray arrayWithObjects:@"accumsan",[keywords componentsJoinedByString:@"|"]; // \b(accumsan|adipiscing|aliquam|…)\b keywordsExpression = [NSRegularExpression regularExpressionWithPattern:pattern] options:NSRegularExpressionCaseInsensitive error:NULL]; } - (void)prepareToFindKeywordsInString:(NSString *)string { NSMutableIndexSet *keywordIndexesMutable = [[NSIndexSet indexSet] mutableCopy]; [keywordsExpression enumerateMatchesInString:string options:0 range:NSMakeRange(0,string.length) usingBlock:^(NSTextCheckingResult *match,NSMatchingFlags flags,BOOL *stop){ [keywordIndexesMutable addIndexesInRange:match.range]; }]; keywordIndexes = [keywordIndexesMutable copy]; } - (NSRange)findNextKeyword:(NSString *)string inRange:(NSRange)range { NSUInteger foundKeywordMax = (foundCharacterSetRange.location == NSNotFound) ? string.length : foundCharacterSetRange.location; NSRange foundKeywordRange = NSMakeRange(NSNotFound,0); for (NSUInteger index = startingAt; index < foundKeywordMax; index++) { if ([keywordIndexes containsIndex:index]) { if (foundKeywordRange.location == NSNotFound) { foundKeywordRange.location = index; foundKeywordRange.length = 1; } else { foundKeywordRange.length++; } } else { if (foundKeywordRange.location != NSNotFound) { break; } } } return foundKeywordRange; } @end
这似乎运作良好,并且性能达到我想要的范围.我想等一下,看看是否有更多的建议,然后再接受这个.