ICONV_OK
ICONV_OK = 0
No bugs detected in iconv.
A UTF-8 specific character encoder that handles cleaning and transforming.
cleanUTF8(string $str, bool $force_php = false) : string
Cleans a UTF-8 string for well-formedness and SGML validity
It will parse according to UTF-8 and return a valid UTF8 string, with non-SGML codepoints excluded.
Specifically, it will permit: \x{9}\x{A}\x{D}\x{20}-\x{7E}\x{A0}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF} Source: https://www.w3.org/TR/REC-xml/#NT-Char Arguably this function should be modernized to the HTML5 set of allowed characters: https://www.w3.org/TR/html5/syntax.html#preprocessing-the-input-stream which simultaneously expand and restrict the set of allowed characters.
string | $str | The string to clean |
bool | $force_php |
testIconvTruncateBug() : int
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly. In particular, rather than ignore characters, it will return an EILSEQ after consuming some number of characters, and expect you to restart iconv as if it were an E2BIG. Old versions of PHP did not respect the errno, and returned the fragment, so as a result you would see iconv mysteriously truncating output. We can work around this by manually chopping our input into segments of about 8000 characters, as long as PHP ignores the error code. If PHP starts paying attention to the error code, iconv becomes unusable.
Error code indicating severity of bug.
testEncodingSupportsASCII(string $encoding, bool $bypass = false) : array
This expensive function tests whether or not a given character encoding supports ASCII. 7/8-bit encodings like Shift_JIS will fail this test, and require special processing. Variable width encodings shouldn't ever fail.
string | $encoding | Encoding name to test, as per iconv format |
bool | $bypass | Whether or not to bypass the precompiled arrays. |
of UTF-8 characters to their corresponding ASCII, which can be used to "undo" any overzealous iconv action.