循环:
var pattern = _dict[key]; string before; do { before = pattern; foreach (var pair in _dict) if (key != pair.Key) pattern = pattern.Replace(string.Concat("{",pair.Key,"}"),string.Concat("(",pair.Value,")")); } while (pattern != before); return pattern;
它只是在一堆键上重复查找和替换.字典只是< string,string>.
我可以看到2个改进.
>每次我们执行pattern.Replace它再次从字符串的开头搜索.如果它碰到第一个{,它只会查看匹配的键列表(可能使用二进制搜索),然后替换适当的一个,那会更好.
>模式!=位之前我是如何检查在迭代期间是否有任何替换.如果pattern.Replace函数返回了实际发生的实际替换次数,我不需要这个.
但是……我真的不想写一个很讨厌的东西来做这一切.这必须是一个相当普遍的情况?有没有现成的解决方案?
全班
感谢Elian Ebbing和ChrisWue.
class FlexDict : IEnumerable<KeyValuePair<string,string>> { private Dictionary<string,string> _dict = new Dictionary<string,string>(); private static readonly Regex _re = new Regex(@"{([_a-z][_a-z0-9-]*)}",RegexOptions.Compiled | RegexOptions.IgnoreCase); public void Add(string key,string pattern) { _dict[key] = pattern; } public string Expand(string pattern) { pattern = _re.Replace(pattern,match => { string key = match.Groups[1].Value; if (_dict.ContainsKey(key)) return "(" + Expand(_dict[key]) + ")"; return match.Value; }); return pattern; } public string this[string key] { get { return Expand(_dict[key]); } } public IEnumerator<KeyValuePair<string,string>> GetEnumerator() { foreach (var p in _dict) yield return new KeyValuePair<string,string>(p.Key,this[p.Key]); } IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); } }
示例用法
class Program { static void Main(string[] args) { var flex = new FlexDict { {"h",@"[0-9a-f]"},{"nonascii",@"[\200-\377]"},{"unicode",@"\\{h}{1,6}(\r\n|[ \t\r\n\f])?"},{"escape",@"{unicode}|\\[^\r\n\f0-9a-f]"},{"nmstart",@"[_a-z]|{nonascii}|{escape}"},{"nmchar",@"[_a-z0-9-]|{nonascii}|{escape}"},{"string1",@"""([^\n\r\f\\""]|\\{nl}|{escape})*"""},{"string2",@"'([^\n\r\f\\']|\\{nl}|{escape})*'"},{"badstring1",@"""([^\n\r\f\\""]|\\{nl}|{escape})*\\?"},{"badstring2",@"'([^\n\r\f\\']|\\{nl}|{escape})*\\?"},{"badcomment1",@"/\*[^*]*\*+([^/*][^*]*\*+)*"},{"badcomment2",@"/\*[^*]*(\*+[^/*][^*]*)*"},{"baduri1",@"url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}"},{"baduri2",@"url\({w}{string}{w}"},{"baduri3",@"url\({w}{badstring}"},{"comment",@"/\*[^*]*\*+([^/*][^*]*\*+)*/"},{"ident",@"-?{nmstart}{nmchar}*"},{"name",@"{nmchar}+"},{"num",@"[0-9]+|[0-9]*\.[0-9]+"},{"string",@"{string1}|{string2}"},{"badstring",@"{badstring1}|{badstring2}"},{"badcomment",@"{badcomment1}|{badcomment2}"},{"baduri",@"{baduri1}|{baduri2}|{baduri3}"},{"url",@"([!#$%&*-~]|{nonascii}|{escape})*"},{"s",@"[ \t\r\n\f]+"},{"w",@"{s}?"},{"nl",@"\n|\r\n|\r|\f"},{"A",@"a|\\0{0,4}(41|61)(\r\n|[ \t\r\n\f])?"},{"C",@"c|\\0{0,4}(43|63)(\r\n|[ \t\r\n\f])?"},{"D",@"d|\\0{0,4}(44|64)(\r\n|[ \t\r\n\f])?"},{"E",@"e|\\0{0,4}(45|65)(\r\n|[ \t\r\n\f])?"},{"G",@"g|\\0{0,4}(47|67)(\r\n|[ \t\r\n\f])?|\\g"},{"H",@"h|\\0{0,4}(48|68)(\r\n|[ \t\r\n\f])?|\\h"},{"I",@"i|\\0{0,4}(49|69)(\r\n|[ \t\r\n\f])?|\\i"},{"K",@"k|\\0{0,4}(4b|6b)(\r\n|[ \t\r\n\f])?|\\k"},{"L",@"l|\\0{0,4}(4c|6c)(\r\n|[ \t\r\n\f])?|\\l"},{"M",@"m|\\0{0,4}(4d|6d)(\r\n|[ \t\r\n\f])?|\\m"},{"N",@"n|\\0{0,4}(4e|6e)(\r\n|[ \t\r\n\f])?|\\n"},{"O",@"o|\\0{0,4}(4f|6f)(\r\n|[ \t\r\n\f])?|\\o"},{"P",@"p|\\0{0,4}(50|70)(\r\n|[ \t\r\n\f])?|\\p"},{"R",@"r|\\0{0,4}(52|72)(\r\n|[ \t\r\n\f])?|\\r"},{"S",@"s|\\0{0,4}(53|73)(\r\n|[ \t\r\n\f])?|\\s"},{"T",@"t|\\0{0,4}(54|74)(\r\n|[ \t\r\n\f])?|\\t"},{"U",@"u|\\0{0,4}(55|75)(\r\n|[ \t\r\n\f])?|\\u"},{"X",@"x|\\0{0,4}(58|78)(\r\n|[ \t\r\n\f])?|\\x"},{"Z",@"z|\\0{0,4}(5a|7a)(\r\n|[ \t\r\n\f])?|\\z"},{"CDO",@"<!--"},{"CDC",@"-->"},{"INCLUDES",@"~="},{"DASHMATCH",@"\|="},{"STRING",@"{string}"},{"BAD_STRING",@"{badstring}"},{"IDENT",@"{ident}"},{"HASH",@"#{name}"},{"IMPORT_SYM",@"@{I}{M}{P}{O}{R}{T}"},{"PAGE_SYM",@"@{P}{A}{G}{E}"},{"MEDIA_SYM",@"@{M}{E}{D}{I}{A}"},{"CHARSET_SYM",@"@charset\b"},{"IMPORTANT_SYM",@"!({w}|{comment})*{I}{M}{P}{O}{R}{T}{A}{N}{T}"},{"EMS",@"{num}{E}{M}"},{"EXS",@"{num}{E}{X}"},{"LENGTH",@"{num}({P}{X}|{C}{M}|{M}{M}|{I}{N}|{P}{T}|{P}{C})"},{"ANGLE",@"{num}({D}{E}{G}|{R}{A}{D}|{G}{R}{A}{D})"},{"TIME",@"{num}({M}{S}|{S})"},{"PERCENTAGE",@"{num}%"},{"NUMBER",@"{num}"},{"URI",@"{U}{R}{L}\({w}{string}{w}\)|{U}{R}{L}\({w}{url}{w}\)"},{"BAD_URI",@"{baduri}"},{"FUNCTION",@"{ident}\("},}; var testStrings = new[] { @"""str""",@"'str'","5","5.","5.0","a","alpha","url(hello)","url(\"hello\")","url(\"blah)",@"\g",@"/*comment*/",@"/**/",@"<!--",@"-->",@"~=","|=",@"#hash","@import","@page","@media","@charset","!/*iehack*/important"}; foreach (var pair in flex) { Console.WriteLine("{0}\n\t{1}\n",pair.Value); } var sw = Stopwatch.StartNew(); foreach (var str in testStrings) { Console.WriteLine("{0} matches: ",str); foreach (var pair in flex) { if (Regex.IsMatch(str,"^(" + pair.Value + ")$",RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture)) Console.WriteLine(" {0}",pair.Key); } } Console.WriteLine("\nRan in {0} ms",sw.ElapsedMilliseconds); Console.ReadLine(); } }
目的
用于构建可能相互扩展的复杂正则表达式.也就是说,我正在尝试实施the css spec.
解决方法
我认为如果你使用正则表达式查找{foo}的任何出现会更快,然后使用
MatchEvaluator替换{foo},如果foo恰好是字典中的键.
var pattern = _dict[key]; bool isChanged = false; do { isChanged = false; pattern = Regex.Replace(pattern,"{([^}]+)}",match => { string matchKey = match.Groups[1].Value; if (matchKey != key && _dict.ContainsKey(matchKey)) { isChanged = true; return "(" + _dict[matchKey] + ")"; } return match.Value; }); } while (isChanged);
我可以问你为什么需要do / while循环吗?字典中键的值是否可以再次包含必须替换的{占位符}?你能确定你不会陷入无限循环,其中键“A”包含“Blahblah {B}”而键“B”包含“Blahblah {A}”吗?
编辑:进一步的改进将是:
>使用预编译的正则表达式.
>使用递归而不是循环(参见ChrisWue’s注释).
>使用_dict.TryGetValue(),如Guffa’s代码中所示.
你最终会得到一个O(n)算法,其中n是输出的大小,所以你不能做得比这更好.