objective-c – 在大型NSString中有效地找到许多关键字中的第一个

前端之家收集整理的这篇文章主要介绍了objective-c – 在大型NSString中有效地找到许多关键字中的第一个前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我需要在大型NSString中找到所有关键字(用于解析源代码),而我当前的实现速度太慢,但我不确定如何改进它.

我正在使用NSRegularExpression,基于它比我能编写的任何内容更优化的假设,但性能比我预期的要慢.有谁知道更快的实现方法

目标字符串将包含utf-8字符,但关键字本身将始终为纯字母数字ascii.我想这可以用来优化一些东西?

@implementation MyClass

// i'm storing the regular expression in a static variable,since it never changes and I need to re-use it often
static NSRegularExpression *keywordsExpression;

+ (void)initialize
{
  [super initialize];

  NSArray *keywords = [NSArray arrayWithObjects:@"accumsan",@"adipiscing",@"aliquam",@"aliquet",@"amet",@"ante",@"arcu",@"at",@"commodo",@"congue",@"consectetur",@"consequat",@"convallis",@"cras",@"curabitur",@"cursus",@"dapibus",@"diam",@"dolor",@"dui",@"elit",@"enim",@"erat",@"eros",@"est",@"et",@"eu",@"felis",@"fermentum",@"gravida",@"iaculis",@"id",@"imperdiet",@"integer",@"ipsum",@"lacinia",@"lectus",@"leo",nil];

  NSString *pattern = [NSString stringWithFormat:@"\\b(%@)\\b",[keywords componentsJoinedByString:@"|"]; // \b(accumsan|adipiscing|aliquam|…)\b
  keywordsExpression = [NSRegularExpression regularExpressionWithPattern:pattern] options:NSRegularExpressionCaseInsensitive error:NULL];
}

// this method will be called in quick succession,I need it to be a able to run tens
// of thousands of times per second. The target string is big (50KB or so),but the
// search range is short,rarely more than 30 characters
- (NSRange)findNextKeyword:(NSString *)string inRange:(NSRange)range
{
  return [keywordsExpression rangeOfFirstMatchInString:string options:0 range:range];
}

@end

编辑根据@ CodeBrickie的回答,我已经更新了我的代码,对整个字符串执行一次正则表达式搜索,并将匹配保存到缓存的NSIndexSet,然后每次调用方法时,它在NSIndexSet中搜索关键字范围而不是搜索字符串.结果大约快一个数量级:

@implementation MyClass

static NSRegularExpression *keywordsExpression;
static NSIndexSet *keywordIndexes = nil;

+ (void)initialize
{
  [super initialize];

  NSArray *keywords = [NSArray arrayWithObjects:@"accumsan",[keywords componentsJoinedByString:@"|"]; // \b(accumsan|adipiscing|aliquam|…)\b
  keywordsExpression = [NSRegularExpression regularExpressionWithPattern:pattern] options:NSRegularExpressionCaseInsensitive error:NULL];
}

- (void)prepareToFindKeywordsInString:(NSString *)string
{
  NSMutableIndexSet *keywordIndexesMutable = [[NSIndexSet indexSet] mutableCopy];
  [keywordsExpression enumerateMatchesInString:string options:0 range:NSMakeRange(0,string.length) usingBlock:^(NSTextCheckingResult *match,NSMatchingFlags flags,BOOL *stop){
    [keywordIndexesMutable addIndexesInRange:match.range];
  }];

  keywordIndexes = [keywordIndexesMutable copy];
}

- (NSRange)findNextKeyword:(NSString *)string inRange:(NSRange)range
{
  NSUInteger foundKeywordMax = (foundCharacterSetRange.location == NSNotFound) ? string.length : foundCharacterSetRange.location;
  NSRange foundKeywordRange = NSMakeRange(NSNotFound,0);
  for (NSUInteger index = startingAt; index < foundKeywordMax; index++) {
    if ([keywordIndexes containsIndex:index]) {
      if (foundKeywordRange.location == NSNotFound) {
        foundKeywordRange.location = index;
        foundKeywordRange.length = 1;
      } else {
        foundKeywordRange.length++;
      }
    } else {
      if (foundKeywordRange.location != NSNotFound) {
        break;
      }
    }
  }

  return foundKeywordRange;
}

@end

这似乎运作良好,并且性能达到我想要的范围.我想等一下,看看是否有更多的建议,然后再接受这个.

解决方法

由于您需要关键字及其范围,我将使用enumerateMatchesInString:options:range:usingBlock:并实现一个块,该关键字将关键字作为键添加,并将范围作为值添加到NSMutableDictionary.

因此,在调用之后,您只需要调用整个字符串,并在字典中显示其范围内的所有关键字.

猜你在找的cocoa相关文章