Please see:
PHP: Character Encoding — Manual[
^].
This is not so simple as it may seem; and, at the same time, it does not create any problems. What do you mean by "PHP"? This is not a standard language. All such things are based on such PHP authority as
PHP: Hypertext Preprocessor[
^]. If you also download PHP from the same source, you can be certain about it, but what prevents any other party from providing some alternative implementation of PHP?
Here is the thing: it is not really important. Unicode is Unicode, it is not UTF-8, UTF-16LE or UTF-32. Unicode is abstracted from encoding or any computer representation of data. It simply define mapping between
characters as pure cultural entities, and
code points understood as abstract integer number as they are understood in mathematics. How those numbers are represented in some computer memory, variables, network/file streams, is defined in UTFs. And now, you have to understand that correct use of programming should not be based on the knowledge of the representation of Unicode in memory accessed by program variables/members/objects. The text data can come from different sources. The program may or may not be based on some
metadata which comes with data. For example, XML encoding comes with
XML prolog
, and HTML encoding comes with HTTP-EQUIV "content-type" declaration (I always repeat that using HTTP-EQUIV is critically important even if encoding is set as the HTTP server's default; think at this: what happens if the page is saved in a file?). This data goes into the PHP data, which you should process using appropriate PHP functions, without knowing how the code points are encoding. For example, if you find a sub-string in a string, both source string and sub-string are represented in the same encoding; this is all that matters.
—SA