Click here to Skip to main content
15,889,281 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi
I have a doubt regarding single byte and multiple byte character. I have seen somewhere how to check whether the character is single byte, double byte , triple byte but didn't get it.
Let b is the character we need to check
For single byte character: b & 0x80 == 0x00;
For double byte character: b & 0xE0 == 0xC0;
For triple byte character: b & 0xF0 == 0xE0;

Can anyone please explain the logic behind these.

Thanks in advance.
Posted

See the UTF-8 encoding at Wikipedia[^]. According to the table, (the first byte of) a single byte character has the most significant bit cleared (0). You may test such a condition by ANDing such byte with 0x80 (that is 10000000 in binary).
Similarly, all two-byte characters starts with the 110 marker and you can test it by b & 0xE0 == 0xC0 (that is b & 11100000b == 11000000b ).
And so on.
 
Share this answer
 
What you can do is to use
C#
int noOfBytes = sizeof(b)


Then you will know how many bytes b requires.

You can find more information here
http://en.wikipedia.org/wiki/Character_encoding[^]

And here
http://en.wikipedia.org/wiki/UTF-16[^]
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900