我使用的只是(),|,空格和字母字符的简单语言.
给出一个如下的正则表达式:
给出一个如下的正则表达式:
(hello|goodbye) (world(s|)|)
我将如何生成以下数据?
hello worlds hello world hello goodbye worlds goodbye world goodbye
我不太确定我是否需要首先构建树,或者是否可以递归地完成.我坚持使用什么数据结构,以及如何在我去时生成字符串.我必须保留一堆标记,并索引到部分构建的字符串中以连接更多的数据吗?我不知道如何最好地解决这个问题.我是否需要首先阅读整个表达,并以某种方式重新排列?
函数签名将以下列方式:
std::vector<std::string> Generate(std::string const&){ //... }
你建议我做什么?
编辑:
让我澄清一下结果应该是有限的.在我的具体例子中,只有6个字符串对于表达式将是真实的.我不知道我的术语在这里是否正确,但我正在寻找的是表达式的完美匹配 – 而不是包含与子字符串匹配的任何字符串.
解决方法
有些跟随
Kieveli的建议,我已经提出了一个工作的解决方案.虽然以前没有提到过,对我来说,重要的是可以得出可能产生多少结果的计数.我在github上发现了一个叫做“
exrex”的python脚本.令人尴尬的是,我没有意识到它有能力也算.尽管如此,我使用我简化的正则表达式语言在C中实现了最好的效果.如果对我的解决方案感兴趣,请继续阅读.
从面向对象的立场来看,我写了一个扫描仪来获取正则表达式(string),并将其转换为令牌列表(字符串向量).然后将令牌列表发送到生成n-ary树的解析器.所有这些都被打包在一个“表达式生成器”类中,它可以采用表达式并保持解析树以及生成的计数.
扫描仪很重要,因为它标记了空字符串的情况,您可以在我的问题中看到,显示为“|)”.扫描也创建了[word] [operation] [word] [operation] … [word]的模式.
例如,扫描:“(hello | goodbye)(world(s |)|)”
将创建:[] [(] [hello] [|] [再见] []] [] [(] [世界] [(] [s] [|] [] []] [] [|] [] [ )] []
解析树是节点的向量.节点包含节点向量的向量.
橙色单元格表示“或”,而绘制连接的其他框代表“和”.以下是我的代码
节点头
#pragma once #include <string> #include <vector> class Function_Expression_Node{ public: Function_Expression_Node(std::string const& value_in = "",bool const& more_in = false); std::string value; bool more; std::vector<std::vector<Function_Expression_Node>> children; };
节点源
#include "function_expression_node.hpp" Function_Expression_Node::Function_Expression_Node(std::string const& value_in,bool const& more_in) : value(value_in),more(more_in) {}
扫描仪标题
#pragma once #include <vector> #include <string> class Function_Expression_Scanner{ public: Function_Expression_Scanner() = delete; public: static std::vector<std::string> Scan(std::string const& expression); };
扫描仪来源
#include "function_expression_scanner.hpp" std::vector<std::string> Function_Expression_Scanner::Scan(std::string const& expression){ std::vector<std::string> tokens; std::string temp; for (auto const& it: expression){ if (it == '('){ tokens.push_back(temp); tokens.push_back("("); temp.clear(); } else if (it == '|'){ tokens.push_back(temp); tokens.push_back("|"); temp.clear(); } else if (it == ')'){ tokens.push_back(temp); tokens.push_back(")"); temp.clear(); } else if (isalpha(it) || it == ' '){ temp+=it; } } tokens.push_back(temp); return tokens; }
解析器标题
#pragma once #include <string> #include <vector> #include "function_expression_node.hpp" class Function_Expression_Parser{ Function_Expression_Parser() = delete; //get parse tree public: static std::vector<std::vector<Function_Expression_Node>> Parse(std::vector<std::string> const& tokens,unsigned int & amount); private: static std::vector<std::vector<Function_Expression_Node>> Build_Parse_Tree(std::vector<std::string>::const_iterator & it,std::vector<std::string>::const_iterator const& end,unsigned int & amount); private: static Function_Expression_Node Recursive_Build(std::vector<std::string>::const_iterator & it,int & total); //<- recursive //utility private: static bool Is_Word(std::string const& it); };
解析源
#include "function_expression_parser.hpp" bool Function_Expression_Parser::Is_Word(std::string const& it){ return (it != "(" && it != "|" && it != ")"); } Function_Expression_Node Function_Expression_Parser::Recursive_Build(std::vector<std::string>::const_iterator & it,int & total){ Function_Expression_Node sub_root("",true); //<- contains the full root std::vector<Function_Expression_Node> root; const auto begin = it; //calculate the amount std::vector<std::vector<int>> multiplies; std::vector<int> adds; int sub_amount = 1; while(*it != ")"){ //when we see a "WORD",add it. if(Is_Word(*it)){ root.push_back(Function_Expression_Node(*it)); } //when we see a "(",build the subtree,else if (*it == "("){ ++it; root.push_back(Recursive_Build(it,sub_amount)); //adds.push_back(sub_amount); //sub_amount = 1; } //else we see an "OR" and we do the split else{ sub_root.children.push_back(root); root.clear(); //store the sub amount adds.push_back(sub_amount); sub_amount = 1; } ++it; } //add the last bit,if there is any if (!root.empty()){ sub_root.children.push_back(root); //store the sub amount adds.push_back(sub_amount); } if (!adds.empty()){ multiplies.push_back(adds); } //calculate sub total int or_count = 0; for (auto const& it: multiplies){ for (auto const& it2: it){ or_count+=it2; } if (or_count > 0){ total*=or_count; } or_count = 0; } /* std::cout << "---SUB FUNCTION---\n"; for (auto it: multiplies){for (auto it2: it){std::cout << "{" << it2 << "} ";}std::cout << "\n";}std::cout << "--\n"; std::cout << total << std::endl << '\n'; */ return sub_root; } std::vector<std::vector<Function_Expression_Node>> Function_Expression_Parser::Build_Parse_Tree(std::vector<std::string>::const_iterator & it,unsigned int & amount){ std::vector<std::vector<Function_Expression_Node>> full_root; std::vector<Function_Expression_Node> root; const auto begin = it; //calculate the amount std::vector<int> adds; int sub_amount = 1; int total = 0; while (it != end){ //when we see a "WORD",sub_amount)); } //else we see an "OR" and we do the split else{ full_root.push_back(root); root.clear(); //store the sub amount adds.push_back(sub_amount); sub_amount = 1; } ++it; } //add the last bit,if there is any if (!root.empty()){ full_root.push_back(root); //store the sub amount adds.push_back(sub_amount); sub_amount = 1; } //calculate sub total for (auto const& it: adds){ total+=it; } /* std::cout << "---ROOT FUNCTION---\n"; for (auto it: adds){std::cout << "[" << it << "] ";}std::cout << '\n'; std::cout << total << std::endl << '\n'; */ amount = total; return full_root; } std::vector<std::vector<Function_Expression_Node>> Function_Expression_Parser::Parse(std::vector<std::string> const& tokens,unsigned int & amount){ auto it = tokens.cbegin(); auto end = tokens.cend(); auto parse_tree = Build_Parse_Tree(it,end,amount); return parse_tree; }
发电机头
#pragma once #include "function_expression_node.hpp" class Function_Expression_Generator{ //constructors public: Function_Expression_Generator(std::string const& expression); public: Function_Expression_Generator(); //transformer void Set_New_Expression(std::string const& expression); //observers public: unsigned int Get_Count(); //public: unsigned int Get_One_Word_Name_Count(); public: std::vector<std::string> Get_Generations(); private: std::vector<std::string> Generate(std::vector<std::vector<Function_Expression_Node>> const& parse_tree); private: std::vector<std::string> Sub_Generate(std::vector<Function_Expression_Node> const& nodes); private: std::vector<std::vector<Function_Expression_Node>> m_parse_tree; unsigned int amount; };
发电机源
#include "function_expression_generator.hpp" #include "function_expression_scanner.hpp" #include "function_expression_parser.hpp" //constructors Function_Expression_Generator::Function_Expression_Generator(std::string const& expression){ auto tokens = Function_Expression_Scanner::Scan(expression); m_parse_tree = Function_Expression_Parser::Parse(tokens,amount); } Function_Expression_Generator::Function_Expression_Generator(){} //transformer void Function_Expression_Generator::Set_New_Expression(std::string const& expression){ auto tokens = Function_Expression_Scanner::Scan(expression); m_parse_tree = Function_Expression_Parser::Parse(tokens,amount); } //observers unsigned int Function_Expression_Generator::Get_Count(){ return amount; } std::vector<std::string> Function_Expression_Generator::Get_Generations(){ return Generate(m_parse_tree); } std::vector<std::string> Function_Expression_Generator::Generate(std::vector<std::vector<Function_Expression_Node>> const& parse_tree){ std::vector<std::string> results; std::vector<std::string> more; for (auto it: parse_tree){ more = Sub_Generate(it); results.insert(results.end(),more.begin(),more.end()); } return results; } std::vector<std::string> Function_Expression_Generator::Sub_Generate(std::vector<Function_Expression_Node> const& nodes){ std::vector<std::string> results; std::vector<std::string> more; std::vector<std::string> new_results; results.push_back(""); for (auto it: nodes){ if (!it.more){ for (auto & result: results){ result+=it.value; } } else{ more = Generate(it.children); for (auto m: more){ for (auto r: results){ new_results.push_back(r+m); } } more.clear(); results = new_results; new_results.clear(); } } return results; }