Encoding issue with Chinese characters

Was having issue with encoding on PHP (server side) to be printed out on via Javascript (client side).

The problem

The Chinese characters were fine when output directly from PHP, became garbled after encoded and decoded on using javascript.
printf('document.write(unescape("%s"));', rawurlencode($data)); 




Solution

After Googled for a while I realize the best solution, I think, is to convert the foreign characters to unicode numeric entities. Example:
Numeric Code HTML Entity Code Result
256 &#256 Ā

Solution for PHP

For PHP there's a simple solution: mb_encode_numericentity(). Luckily the convmap for conversion (excluding HTML character) are in the comment. (As pointed in http://stackoverflow.com/a/3116893)

function convertToNumericEntities($string) {
    $convmap = array(0x80, 0x10ffff, 0, 0xffffff);
    return mb_encode_numericentity($string, $convmap, "UTF-8");
}


Solution TinyMCE

If you are using TinyMCE, this can be solved by enabling "numeric" for entity_encoding.

tinyMCE.init({
        ...
        entity_encoding : "numeric"
});

More info http://www.tinymce.com/wiki.php/Configuration:entity_encoding




References:
http://stackoverflow.com/a/3116893
http://www.php.net/manual/en/function.mb-encode-numericentity.php#29839
http://www.tinymce.com/forum/viewtopic.php?pid=63319
http://stackoverflow.com/questions/4691477/php-converting-unicode-strings-to-ansi-strings

Comments

Popular posts from this blog

[Azure Websites PHP] Cross Domain request results in blank response page after Preflight HTTP OPTIONS

[Magento] Create Contact Form with Dynamic Recipient