Click here to Skip to main content
15,885,309 members
Articles / All Topics

Migrating from PHP 5.3 to PHP 5.4 or 5.5 - Watch Out For a Dangerous iconv Bug

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
18 Feb 2015CPOL2 min read 7.7K   2  
If you are migrating from PHP 5.3 to PHP 5.4 or 5.5, then watch out for a dangerous iconv bug

If you are using iconv to filter out invalid characters for strings and you migrate to PHP 5.5, you may experience the nasty bug that bit me.

Currently, I am converting all my web data from a nice UTF-8 format to ISO-8859-1 (otherwise known as ISO Latin-1) for use for inserting into PDF reports using the fantastic FPDF library.

The code looks something like this:

$clean = @iconv("UTF-8", "ISO-8859-1//IGNORE//TRANSLIT", $text);

I used the error suppression here so no errors get output to the screen when an invalid character needs to be stripped. If the error displays on the screen, then it interrupts the creation of the PDF file and the user does not get a file. Not so great, right?

Here is the error that is returned when the error is not suppressed:

Notice: iconv(): Detected an illegal character in input string in {...}

I had been using the //IGNORE directive to direct the function to ignore characters that have errors in them. I also use //TRANSLIT so that if a character doesn't match exactly to the specific character set, the closest approximation is used.

$text = "Equipment List – Projéct [2014-Nov-28]";
$clean = @iconv("UTF-8", "ISO-8859-1//IGNORE//TRANSLIT", $text);
print_ln($clean);

It may be hard to see, but the first dash '–' is an em dash and is actually a different character and longer than the '-' en dash (or minus symbol). Also, I placed an 'e' on project just for good measure. That doesn't have a representation in the ISO-8859-1 character set according to iconv.

In my PHP version 5.3.29 (with iconv library version 2.17), I get the output:

Equipment List - Proj?ct [2014-Nov-28] 

However in PHP version 5.5.19 (also with iconv library version 2.17) with the same code, I get no output at all.

Temporary Solution

So how to correct for that behavior? Well, I'm not quite sure what's going on in the code for iconv, but I found that if I remove the //IGNORE directive and just leave //TRANSLIT, then I am ok.

$text = "Equipment List – Projéct [2014-Nov-28]";
$clean = @iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text);
print_ln($clean);

Output:

Equipment List - Proj?ct [2014-Nov-28]

That's what I wanted, so we should be good for now. At least until I can examine the iconv source code from GNU Libc and see what is going on.

Update

After reviewing the source code to the PHP implementation of iconv (from PHP 5.3 iconv to PHP 5.4 iconv), I think the culprit lies in how PHP is calling the iconv library, not the iconv library itself. There is a line that calls the PHP return value after a check for errors and handles it differently in the later version (line 2390 of PHP5.4+.iconv.c).

if (err == PHP_ICONV_ERR_SUCCESS && out_buffer != NULL) {
    RETVAL_STRINGL(out_buffer, out_len, 0);
} else {
    if (out_buffer != NULL) {
        efree(out_buffer);
    }
    RETURN_FALSE;
}

In the original 5.3 version, it just returned what was found (line 2330 of PHP5.3.iconv.c)

if (out_buffer != NULL) {
    RETVAL_STRINGL(out_buffer, out_len, 0);
} else {
    RETURN_FALSE;
}

It looks like that extra check (bolded above) is causing any failure to return FALSE which will give you an empty string''.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Chief Technology Officer WorxForUs
United States United States
I am a programmer who posts rambling on about java, Android, PHP, or whatever I am motivated to type on my charcoal colored Kinesis Freestyle2 keyboard. Please send +1's, shared links, warm thoughts of encouragement, or emasculating flames of internet fury to my blog. Thanks for reading!

righthandedmonkey.com

Comments and Discussions

 
-- There are no messages in this forum --