在
comment to an answer to this question中暗示了PHP不能反转Unicode字符串.
As for Unicode,it works in PHP
because most apps process it as
binary. Yes,PHP is 8-bit clean. Try
the equivalent of this in PHP: perl
-Mutf8 -e ‘print scalar reverse(“ほげほげ”)’ You will get garbage,
not “げほげほ”. – jrockway
不幸的是,PHPs unicode支持atm是最好的“缺乏”是正确的.这将是hopefully change drastically with PHP6.
PHP MultiByte functions确实提供了处理unicode所需的基本功能,但它不一致,缺少很多功能.其中之一是反转字符串的功能.
我当然想把这个文本翻译成没有其他原因,然后弄清楚是否有可能.我做了一个功能来完成这个巨大的复杂的任务来扭转这个Unicode文本,所以你可以放松一下,直到PHP6.
测试代码:
$enc = 'UTF-8'; $text = "ほげほげ"; $defaultEnc = mb_internal_encoding(); echo "Showing results with encoding $defaultEnc.\n\n"; $revNormal = strrev($text); $revInt = mb_strrev($text); $revEnc = mb_strrev($text,$enc); echo "Original text is: $text .\n"; echo "Normal strrev output: " . $revNormal . ".\n"; echo "mb_strrev without encoding output: $revInt.\n"; echo "mb_strrev with encoding $enc output: $revEnc.\n"; if (mb_internal_encoding($enc)) { echo "\nSetting internal encoding to $enc from $defaultEnc.\n\n"; $revNormal = strrev($text); $revInt = mb_strrev($text); $revEnc = mb_strrev($text,$enc); echo "Original text is: $text .\n"; echo "Normal strrev output: " . $revNormal . ".\n"; echo "mb_strrev without encoding output: $revInt.\n"; echo "mb_strrev with encoding $enc output: $revEnc.\n"; } else { echo "\nCould not set internal encoding to $enc!\n"; }
Grapheme功能处理UTF-8字符串比mbstring和PCRE功能更正确/ Mbstring和PCRE可能会中断字符.您可以通过执行以下代码来看到它们之间的差异.
原文链接:https://www.f2er.com/php/132177.htmlfunction str_to_array($string) { $length = grapheme_strlen($string); $ret = []; for ($i = 0; $i < $length; $i += 1) { $ret[] = grapheme_substr($string,$i,1); } return $ret; } function str_to_array2($string) { $length = mb_strlen($string,"UTF-8"); $ret = []; for ($i = 0; $i < $length; $i += 1) { $ret[] = mb_substr($string,1,"UTF-8"); } return $ret; } function str_to_array3($string) { return preg_split('//u',$string,-1,PREG_SPLIT_NO_EMPTY); } function utf8_strrev($string) { return implode(array_reverse(str_to_array($string))); } function utf8_strrev2($string) { return implode(array_reverse(str_to_array2($string))); } function utf8_strrev3($string) { return implode(array_reverse(str_to_array3($string))); } // http://www.PHP.net/manual/en/function.grapheme-strlen.PHP $string = "a\xCC\x8A" // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5) ."o\xCC\x88"; // 'LATIN SMALL LETTER O WITH DIAERESIS' (U+00F6) var_dump(array_map(function($elem) { return strtoupper(bin2hex($elem)); },[ 'should be' => "o\xCC\x88"."a\xCC\x8A",'grapheme' => utf8_strrev($string),'mbstring' => utf8_strrev2($string),'pcre' => utf8_strrev3($string) ]));
结果就在这里.
array(4) { ["should be"]=> string(12) "6FCC8861CC8A" ["grapheme"]=> string(12) "6FCC8861CC8A" ["mbstring"]=> string(12) "CC886FCC8A61" ["pcre"]=> string(12) "CC886FCC8A61" }
IntlBreakIterator可以使用PHP 5.5(intl 3.0);
function utf8_strrev($str) { $it = IntlBreakIterator::createCodePointInstance(); $it->setText($str); $ret = ''; $pos = 0; $prev = 0; foreach ($it as $pos) { $ret = substr($str,$prev,$pos - $prev) . $ret; $prev = $pos; } return $ret; }