给定想要从CLI
PHP脚本输出的字符的Unicode十进制或十六进制数,PHP如何生成它? chr()函数似乎没有生成正确的输出.这是我的测试脚本,使用Section Break字符U 00A7(十六进制中的A7,十进制中的167,应该用UTF-8中的C2 A7表示)作为测试:
<?PHP echo "Section sign: ".chr(167)."\n"; // Using CHR function echo "Section sign: ".chr(0xA7)."\n"; echo "Section sign: ".pack("c",0xA7)."\n"; // Using pack function? echo "Section sign: §\n"; // Copy and paste of the symbol into source code
我得到的输出(通过SSH会话到服务器)是:
Section sign: ? Section sign: ? Section sign: ? Section sign: §
因此,这证明我正在使用的终端字体中包含Section Break字符,并且SSH连接正在成功发送它,但是chr()在从代码编号构造它时没有正确构造它.
如果我所拥有的只是代码编号而不是复制/粘贴选项,我有什么选择?
在排除mb_ functions和iconv时,PHP不了解Unicode.你必须自己编写UTF-8编码字符.
为此,维基百科有一个关于UTF-8结构的excellent overview.这是基于该文章的快速,肮脏和未经测试的功能:
function codepointToUtf8($codepoint) { if ($codepoint < 0x7F) // U+0000-U+007F - 1 byte return chr($codepoint); if ($codepoint < 0x7FF) // U+0080-U+07FF - 2 bytes return chr(0xC0 | ($codepoint >> 6)).chr(0x80 | ($codepoint & 0x3F); if ($codepoint < 0xFFFF) // U+0800-U+FFFF - 3 bytes return chr(0xE0 | ($codepoint >> 12)).chr(0x80 | (($codepoint >> 6) & 0x3F).chr(0x80 | ($codepoint & 0x3F); else // U+010000-U+10FFFF - 4 bytes return chr(0xF0 | ($codepoint >> 18)).chr(0x80 | ($codepoint >> 12) & 0x3F).chr(0x80 | (($codepoint >> 6) & 0x3F).chr(0x80 | ($codepoint & 0x3F); }