在阿拉伯语中,像“ا”(Alef)这样的字母有很多形式/变体:
(ا,أ,Å,آ)
也是字母ي的情况相同,也可能是ى.
我想要做的是获得一个单词的所有可能的变化与许多أ和ي字母.
例如,“أين”这个词应该包含所有这些(大多数情况下都是不正确的)变体:أين,إين,اين,آين,أىن,اىن,آىن……等等.
为什么?我正在构建一个小的文本更正系统,可以处理语法错误并用正确的单词替换错误的单词.
我一直试图以最干净的方式做到这一点,但我最终得到一个8 for / foreach循环只是为了处理“أ”这个词
必须有一个更好的更干净的方式来做到这一点!有什么想法吗?
这是我的代码到目前为止:
$alefVariations = ['ا','إ','أ','آ']; $word = 'أيامنا'; // Break into letters $wordLetters = preg_split('//u',$word,null,PREG_SPLIT_NO_EMPTY); $wordAlefLettersIndexes = []; // Get the أ letters for($letterIndex = 0; $letterIndex < count($wordLetters); $letterIndex++){ if(in_array($wordLetters[$letterIndex],$alefVariations)){ $wordAlefLettersIndexes[] = $letterIndex; } } $eachLetterVariations = []; foreach($wordAlefLettersIndexes as $alefLettersIndex){ foreach($alefVariations as $alefVariation){ $wordCopy = $wordLetters; $wordCopy[$alefLettersIndex] = $alefVariation; $eachLetterVariations[$alefLettersIndex][] = $wordCopy; } } $variations = []; foreach($wordAlefLettersIndexes as $alefLettersIndex){ $alefWordVariations = $eachLetterVariations[$alefLettersIndex]; foreach($wordAlefLettersIndexes as $alefLettersIndex_inner){ if($alefLettersIndex == $alefLettersIndex_inner) continue; foreach($alefWordVariations as $alefWordVariation){ foreach($alefVariations as $alefVariation){ $alefWordVariationCopy = $alefWordVariation; $alefWordVariationCopy[$alefLettersIndex_inner] = $alefVariation; $variations[] = $alefWordVariationCopy; } } } } $finalList = []; foreach($variations as $variation){ $finalList[] = implode('',$variation); } return array_unique($finalList);
我不认为这是自动更正的方法,但这里是您提出的问题的通用解决方案.它使用递归,它是在JavaScript(我不知道PHP).
原文链接:https://www.f2er.com/php/137377.htmlfunction solve(word,sameLetters,customIndices = []){ var splitLetters = word.split('') .map((char,index) => { // check if the current letter is within any variation if(customIndices.length == 0 || customIndices.includes(index)){ var variations = sameLetters.find(arr => arr.includes(char)); if(variations != undefined) return variations; } return [char]; }); // up to this point splitLetters will be like this // [["ا","إ","أ","آ"],["ي","ى","ي"],["ا"],["م"],["ن"],["ا"]] var res = []; recurse(splitLetters,'',res); // this function will generate all the permuations return res; } function recurse(letters,index,cur,res){ if(index == letters.length){ res.push(cur); } else { for(var letter of letters[index]) { recurse(letters,index + 1,cur + letter,res ); } } } var sameLetters = [ // represents the variations that you want to enumerate ['ا','آ'],['ي','ى','ي'] ]; var word = 'أيامنا'; var customIndices = [0,1]; // will make variations to the letters in these indices only. leave it empty for all indices var ans = solve(word,customIndices); console.log(ans);