Lua中的Greed / Non-Greedy模式匹配和可选后缀

前端之家收集整理的这篇文章主要介绍了Lua中的Greed / Non-Greedy模式匹配和可选后缀前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
在Lua,我试图模式匹配和捕获:
+384 Critical Strike (Reforged from Parry Chance)

(+384) (Critical Strike)

其中后缀(从%s重新引导)是可选的.

长版

我试图在Lua using patterns (i.e. strfind)中匹配一个字符串

Note: In Lua they don’t call them regular expressions,they call them patterns because they’re not 07001.

示例字符串:

+384 Critical Strike
+1128 Hit

这被分解成我想要捕获的两个部分:

>数字,具有领先的正或负指标;他的情况是384
>字符串,在这种情况下是Critical Strike.

我可以使用相当简单的模式捕获这些:

这种模式在lua的作品:

local text = "+384 Critical Strike";
local pattern = "([%+%-]%d+) (.+)";
local _,_,value,stat = strfind(text,pattern);

>值= 384
> stat = Critical Strike

棘手部分

Now我需要扩展正则表达式模式以包含一个可选的后缀:

+384 Critical Strike (Reforged from Parry Chance)

哪个细分为:

注意:我并不特别关心可选的尾随后缀;这意味着我没有要求捕获它.虽然捕获它会很方便.

这是我开始遇到贪婪捕获问题的地方.马上我的模式我已经做了我不想要的:

> pattern =([%% – ]%d)(.)
>值= 384
> stat = Critical Strike(Relegged from Parry Chance)

但是让我们尝试在模式中包含后缀:

与图案:

pattern = "([%+%-]%d+) (.+)( %(Reforged from .+%))?"

我正在使用?运算符指示后缀的0或1个外观.除了没有匹配.

我盲目地尝试从括号(括号[:

pattern = "([%+%-]%d+) (.+)[ %(Reforged from .+%)]?"

但现在比赛再次贪婪

>值= 384
> stat = Critical Strike(Relegged from Parry Chance)

基于Lua pattern reference):

  • x: (where x is not one of the magic characters ^$()%.[]*+-?) represents the character x itself.
  • .: (a dot) represents all characters.
  • %a: represents all letters.
  • %c: represents all control characters.
  • %d: represents all digits.
  • %l: represents all lowercase letters.
  • %p: represents all punctuation characters.
  • %s: represents all space characters.
  • %u: represents all uppercase letters.
  • %w: represents all alphanumeric characters.
  • %x: represents all hexadecimal digits.
  • %z: represents the character with representation 0.
  • %x: (where x is any non-alphanumeric character) represents the character x. This is the standard way to escape the magic characters. Any punctuation character (even the non magic) can be preceded by a ‘%’ when used to represent itself in a pattern.
  • [set]: represents the class which is the union of all characters in set. A range of characters can be specified by separating the end characters of the range with a ‘-‘. All classes %x described above can also be used as components in set. All other characters in set represent themselves. For example,[%w_] (or [_%w]) represents all alphanumeric characters plus the underscore,[0-7] represents the octal digits,and [0-7%l%-] represents the octal digits plus the lowercase letters plus the ‘-‘ character.
    The interaction between ranges and classes is not defined. Therefore,patterns like [%a-z] or [a-%%] have no meaning.
  • [^set]: represents the complement of set,where set is interpreted as above.

For all classes represented by single letters (%a,%c,etc.),the corresponding uppercase letter represents the complement of the class. For instance,%S represents all non-space characters.

The definitions of letter,space,and other character groups depend on the current locale. In particular,the class [a-z] may not be equivalent to %l.

和魔术师:

> *,它与课程中的0或更多重复的字符匹配.这些重复项目将始终与最长可能的序列匹配;
>,它与课程中的1个或多个重复的字符相匹配.这些重复项目将始终与最长可能的序列匹配;
> –,它也匹配0或更多重复的类中的字符.与’*’不同,这些重复项将始终与最短的序列匹配;
>?,它匹配类中的0或1个字符;

我注意到有一个贪心的*和一个非贪婪的修饰符.由于我的中间字符串匹配:

(%d) (%s) (%s)

似乎正在吸收文本直到结束,也许我应该尝试使它不贪心,通过将*更改为 –

oldPattern = "([%+%-]%d+) (.*)[ %(Reforged from .+%)]?"
newPattern = "([%+%-]%d+) (.-)[ %(Reforged from .+%)]?"

除了现在它不能匹配:

>值= 384
> stat = nil

而不是中间组捕获“任何”字符(即),我尝试了一个包含除(:

pattern = "([%+%-]%d+) ([^%(]*)( %(Reforged from .+%))?"

从那里,车轮从车里掉下来:

local pattern = "([%+%-]%d+) ([^%(]*)( %(Reforged from .+%))?"
local pattern = "([%+%-]%d+) ((^%()*)( %(Reforged from .+%))?"
local pattern = "([%+%-]%d+) (%a )+)[ %(Reforged from .+%)]?"

我以为我很亲近:

local pattern = "([%+%-]%d+) ([%a ]+)[ %(Reforged from .+%)]?"

捕获

- value = "+385"
- stat = "Critical Strike "  (notice the trailing space)

所以这是我的头撞在枕头上去睡觉的地方我不敢相信我花了四个小时在这个正则表达式….模式.

@NicolBolas使用伪代码正则表达式语言定义的所有可能字符串的集合是:

+%d %s (Reforged from %s)

哪里

>代表Plus Sign (+)“Minus Sign” (-)
>%d表示任何拉丁数字字符(例如0..9)
>%s表示任何拉丁字母或小写字母或嵌入空格(例如A-Za-z)
>剩下的字符是文字​​.

如果我不得不写一个正则表达式,显然会试图做我想要的:

\+\-\d+ [\w\s]+( \(Reforged from [\w\s]+\))?

但是,如果我没有足够的解释,我可以给你几乎完整的所有值的列表,我很可能在野外遇到

> 123 Parry正数,单字
> 123关键打击正数,两个字
> -123 Parry负数,单字
> -123关键打击负数,两个字
> 123 Parry(Reveged from Dodge)正数,单字,可选后缀,单字
> 123 Critical Strike(Reveged from Dodge)正数,两个字,两个字
> -123 Parry(从Hit Chance重新建立)负数,两个字
> -123关键打击(从命中机会重建)负数,两个字

有奖金模式,似乎很明显,模式也会匹配:

> 1234关键打击机会四位数字,三个字
> 12345挂载和运行速度增加五位数,五个字
> 123456安装和运行速度增加六位数,五个字
> -1 MoUnT aNd RuN sPeEd InCrEaSe一位数字,五个字
> -1 HiT(由CrItIcAl StRiKe ChAnCe重建)负一位数字,一个字,含3个字

而理想模式应符合上述奖励条件,则不需要.

Loalization

实际上,我试图解析的所有“数字”将被本地化,例如:

> 123,456英文(en-US)
> 123.456德国(de-DE)
> 123’456法语(fr-CA)
>爱沙尼亚语(et-EE)> 123 456
> Assamese(as-IN)> 1,23,456

任何答案都不能试图解释这些本地化问题.你不知道一个数字将呈现的区域设置,这就是为什么这个号码本地化已经从问题中删除了.你必须严格地假设数字包含加号,连字号减号和拉丁字母数字0到9.我已经知道如何解析本地化数字.这个问题是试图将可选后缀与贪心的模式解析器相匹配.

编辑:你真的没有尝试处理本地化号码.在某种程度上想要处理它们,不知道该地区是错误的.例如,我没有包括所有可能的数字本地化.对于另一个人:我不知道将来可能存在什么未来的本地化.

解决方法

嗯我没有安装Lua4,但是这种模式在Lua5下工作.我期望它也适用于Lua4.

更新1:由于已经指定了额外的要求(本地化),我已经调整了模式和测试来反映这些.

更新2:更新了模式和测试,以处理另一类文本,其中包含@IanBoyd在注释中提到的数字.补充说明
的字符串模式.

更新3:增加了本地号码分开处理的情况,如上次更新中提到的.

尝试:

"(([%+%-][',%.%d%s]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"

或(不尝试验证号码本地化令牌) – 只需在模式结尾处采取不带有数字哨兵信件:

"(([%+%-][^%a]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"

上述两种模式都不是用科学记数法来处理数字(例如:1.23e 10)

Lua5测试(编辑清理 – 测试混乱):

function test(tab,pattern)
   for i,v in ipairs(tab) do
     local f1,f2,f3,f4 = v:match(pattern)
     print(string.format("Test{%d} - Whole:{%s}\nFirst:{%s}\nSecond:{%s}\nThird:{%s}\n",i,f1,f4))
   end
 end

 local pattern = "(([%+%-][',%.%d%s]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"
 local testing = {"+123 Parry","+123 Critical Strike","-123 Parry","-123 Critical Strike","+123 Parry (Reforged from Dodge)","+123 Critical Strike (Reforged from Dodge)","-123 Parry (Reforged from Hit Chance)","-123 Critical Strike (Reforged from Hit Chance)","+122384    Critical    Strike      (Reforged from parry chance)","+384 Critical Strike ","+384Critical Strike (Reforged from parry chance)","+1234 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)","+12345 Mount and run speed increase (Reforged from CrItIcAl StRiKe ChAnCe)","+123456 Mount and run speed increase (Reforged from CrItIcAl StRiKe ChAnCe)","-1 MoUnT aNd RuN sPeEd InCrEaSe (Reforged from CrItIcAl StRiKe ChAnCe)","-1 HiT (Reforged from CrItIcAl StRiKe ChAnCe)","+123,456 +1234 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)","+123.456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)","+123'456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)","+123 456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)","+1,456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)","+9 mana every 5 sec","-9 mana every 20 min (Does not occurr in data but gets captured if there)"}
 test(testing,pattern)

以下是模式的细分:

local explainPattern =  
   "(" -- start whole string capture
   ..
   --[[
   capture localized number with sign - 
   take at first as few digits and separators as you can 
   ensuring the capture ends with at least 1 digit
   (the last digit is our sentinel enforcing the boundary)]]
   "([%+%-][',%.%d%s]-[%d]+)" 
   ..
   --[[
   gobble as much space as you can]]
   "%s*"
   ..
   --[[
   capture start with letters,followed by anything which is not a bracket 
   ending with at least 1 letter]]
   "([%a]+[^%(^%)]+[%a]+)"
   ..
   --[[
   gobble as much space as you can]]
   "%s*"
   ..
   --[[
   capture an optional bracket
   followed by 0 or more letters and spaces
   ending with an optional bracket]]
   "(%(?[%a%s]*%)?)"
   .. 
   ")" -- end whole string capture

猜你在找的Lua相关文章