[译] 翻译NSHipster,NSRegularExpression 正则表达式

有些人说：遇到问题，明白了。我要用 NSRegularExpression: 正则表达式，现在有三个问题

编程世界里，正则表达式应用广泛。有些觉得这个十分难以理解，符号密集。更多的程序员觉得，正则表达式是代码的基础，是一个笑话。
有些程序员觉得，正则表达式，真是轻便，有强大。没有工具库这样通用的神器的人，不知道他们怎么工作的。
首先，我们都认为，Cocoa 在NSRegularExpression中，封装了你不曾见识过的强大的正则表达式。
不行吗？我们从这个HTML片段中提取链接，先用Ruby

htmlSource = "Questions? Corrections? [@NSHipster](http://www.jianshu.com/%22https://twitter.com/NSHipster/%22) or [on GitHub](http://www.jianshu.com/%22https://github.com/NSHipster/articles/%22)."


linkRegex = /]*href="([^"]*)"[^>]*>/i links = htmlSource.scan(linkRegex) puts(links)
# https://twitter.com/NSHipster
# https://github.com/NSHipster/articles

两三行程序，看你怎么数了。
还行吧，现在试试在Swift 中使用 NSRegularExpression 提取链接

let  htmlSource="Questions? Corrections? [@NSHipster](http://www.jianshu.com/%22https://twitter.com/NSHipster/%22) or [on GitHub](http://www.jianshu.com/%22https://github.com/NSHipster/articles/%22)."

let linkRegexPattern = "]*href=\"([^\"]*)\"[^>]*>"

let  linkRegex = try!  NSRegularExpression(pattern: linkRegexPattern,options:.caseInsensitive) 
 let  matches = linkRegex.matches(in:htmlSource,range:NSMakeRange(0,htmlSource.utf16.count))
let links = matches.map{result->Stringing
let hrefRange = result.rangeAt(1)
let start = String.UTF16Index(hrefRange.location)
let end = String.UTF16Index(hrefRange.location+hrefRange.length)
return String(htmlSource.utf16[start..<end])!
}
print(links)
// ["https://twitter.com/NSHipster","https://github.com/NSHipster/articles"]

不要BS

这篇文章，不会深入浅出地讲解正则表达式本身。你可能需要学习通配符，反向引用，先行断言等等。
阅读苹果的NSRegularExpression,NSTextCheckingResult及其相关联的文档，你会明了Swift 中的正则表达式。

NSString Methods ， NSString 方法

在 Cocoa 框架中对正则表达式最简单的使用，当然是跳过NSRegularExpression。
NSString 的 range(of:...) 方法，给出.regularExpression 选项时，即可切换到正则表达式模式，
然后轻量级的搜索可以这么写：

Swift

let source = "For NSSet and NSDictionary,the breaking..."

// 匹配一切Cocoa 中的类型 
// UIButton,NSCharacterSet,NSURLSession,等等
let typePattern = "[A-Z]{3,}[A-Za-z0-9]+"

if let typeRange = source.range(of: typePattern,options: .regularExpression) {
    print("First type: \(source[typeRange])")
    //  NSSet
}

Objective-C

NSString *source = @"For NSSet and NSDictionary,the breaking...";

// 匹配一切Cocoa 中的类型 
// UIButton,等等
NSString *typePattern = @"[A-Z]{3,}[A-Za-z0-9]+";
NSRange typeRange = [source rangeOfString:typePattern
                                  options:NSRegularExpressionSearch];

if (typeRange.location != NSNotFound) {
    NSLog(@"First type: %@",[source substringWithRange:typeRange]);
    // NSSet
}

同样的选项，使用replacingOccurrences(of:with:...)方法，轻松实现替换，
看，我们怎样通过这种巧妙的方法，在文本中用Markdown风格的倒引号，把每一个类型名括起来。

Swift

let markedUpSource = source.replacingOccurrences(of: typePattern,with: "`$0`",options: .regularExpression)
print(markedUpSource)
//例句： "For `NSSet` and `NSDictionary`,the breaking...""

Objective-C

NSString *markedUpSource = 
    [source stringByReplacingOccurrencesOfString:typePattern 
                                      withString:@"`$0`"
                                         options:NSRegularExpressionSearch
                                           range:NSMakeRange(0,source.length)];
NSLog(@"%@",markedUpSource);
// 例句： "For `NSSet` and `NSDictionary`,the breaking...""

在替换模版中，这种运用正则表达式的途径，甚至可以处理小群的引用。
变形出又快又黑的儿童黑话

Swift

let ourcesay = source.replacingOccurrences(
    of: "([bcdfghjklmnpqrstvwxyz]*)([a-z]+)",with: "$2$1ay",options: [.regularExpression,.caseInsensitive])
print(ourcesay)
// "orFay etNSSay anday ictionaryNSDay,ethay eakingbray..."     故意颠倒英语字母顺序拼凑而成的行话；儿童黑话

Objective-C

NSString *ourcesay = 
    [source stringByReplacingOccurrencesOfString:@"([bcdfghjklmnpqrstvwxyz]*)([a-z]+)"
                                      withString:@"$2$1ay"
                                         options:NSRegularExpressionSearch | NSCaseInsensitiveSearch
                                           range:NSMakeRange(0,source.length)];                                                                
NSLog(@"%@",ourcesay);
// "orFay etNSSay anday ictionaryNSDay,ethay eakingbray..."  故意颠倒英语字母顺序拼凑而成的行话；儿童黑话

这两个方法用来处理许多需要使用正则表达式的情况，足够了。我们需要NSRegularExpression本身来处理，更复杂的情况。
我们先用Swift 区分较复杂的数据。

NSRange and Swift,Swift 中的 NSRange

不同于Foundation框架下的NSString，Swift 字符串的字符与子串，更同意理解，看起来更复合。
Swift的标准库String的数据有四种不同的编码方式
可以快速地通过字符、统一码标量值、UTF-8 编码单元或者 UTF-16编码单元，访问字符串里面的数据。
这与NSRegularExpression，有什么关系？
许多 NSRegularExpression 方法使用NSRange ， NSTextCheckingResult的实例，来存取匹配的数据。
同样的， NSRange 使用两个整型参数 location 和 length，
但是String中没有使用整型参数作为索引的方式。

let range = NSRange(location: 4,length: 5)

// 下面没有一种，能编译成功
source[range]
source.characters[range]
source.substring(with: range)
source.substring(with: range.toRange()!)

困惑吧！心累吧！

继续干！

一切都不是，表面看起来毫无关联的。
Swift 中的 String,明确就是与Foundation框架下的 NSString 的APIs ，具有互操作性。
只要引入了Foundation框架，就可以直接以utf16的形式，直接用整型创建新的索引。

let start = String.UTF16Index(range.location)
let end = String.UTF16Index(range.location + range.length)
let substring = String(source.utf16[start..<end])!
// 现在子串是 "NSSet"

请记住，对String扩展一下，就能轻松地区分开Swift、 Objective-C的 NSRange

extension String {
   /// 有一个 `NSRange`  反射出这个字符串的全部范围
   var nsrange: NSRange {
       return NSRange(location: 0,length: utf16.count)
   }

   ///  返回给定范围的子串   
   /// 如果该范围，转换失败，返回nil
   func substring(with nsrange: NSRange) -> String? {
       guard let range = nsrange.toRange() 
           else { return nil }
       let start = UTF16Index(range.lowerBound)
       let end = UTF16Index(range.upperBound)
       return String(utf16[start..<end])
   }

   /// 返回与 给定范围相同的范围
   /// 如果该范围，转换失败，返回nil
   func range(from nsrange: NSRange) -> Range<Index>? {
       guard let range = nsrange.toRange() else { return nil }
       let utf16Start = UTF16Index(range.lowerBound)
       let utf16End = UTF16Index(range.upperBound)

       guard let start = Index(utf16Start,within: self),let end = Index(utf16End,within: self)
           else { return nil }

       return start..<end
   }
}

下一结中，我们将会讲解这些，我们会明了NSRegularExpression 的用途

NSRegularExpression & NSTextCheckingResult NSRegularExpression 与 NSTextCheckingResult

[译] 翻译NSHipster,NSRegular​Expression 正则表达式

猜你在找的正则表达式相关文章

[译] 翻译NSHipster,NSRegularExpression 正则表达式