Click here to Skip to main content
15,917,538 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
The purpose of this program should be so I can use my fancy 'å', 'ä' and 'ö' letters :).
I've read a lot about the subject but can't find any examples (in C) of how to use and include UTF-8 strings (lets focus this question on UTF-8 and take UTF-16 later).
The best I've found is to use the setlocale() function.
But how do one simply use that function?

int _tmain(int argc, _TCHAR* argv[])
{
  setlocale( LC_ALL, "sv" );
  wchar_t UTF-8_test[] = {'å', 'ö', 'ä'};
  
 return 0;
}

This was my first instinct on how to test it...
Would you be suprised that it did work but the values of the UTF-8_test string were incorrect.
Posted
Updated 12-Dec-15 9:15am
v2

1 solution

The wchar_t type is a 16 bit character, i.e. UTF16. You also need to use the L prefix on all your character and string literals so the compiler generates the correct characters. So your code should look like:
C++
int _tmain(int argc, _TCHAR* argv[])
{
  setlocale( LC_ALL, "sv" ); // no L here as function takes a char literal
  wchar_t UTF-16_test[] = {L'å', L'ö', L'ä'};
  
 return 0;
}
 
Share this answer
 
Comments
JONLIN 13-Dec-15 9:11am    
Eureka it worked :D
But what does the 'L' specifier specify? As far as I know an L stand for "long int"...
Also how do i know this is UTF-16LE and not UTF-16BE or even UTF-8?
And yes the wchar_t has 16 bits but the first 8 specify the character and the latter specify which encoding system it uses. How could I access the second half of the bits if I want to do conversions?
The setlocale function sets the encoding system for characters for a set amount of files but and the "sv" (should) stand for "sweden" and it is very illogical to use UTF-16 for the lanuage as there really only are the å, ä and ö characters that are special and they exist in UTF-8. UTF-8 has better performance so why did it set it to UTF-16?

It feels like that code raised more questions than answers ;)
Richard MacCutchan 13-Dec-15 9:38am    
But what does the 'L' specifier specify?
I explained that in my answer above.
Also how do i know this is UTF-16LE and not UTF-16BE or even UTF-8?
It is Unicode, that's the Windows standard.
And yes the wchar_t has 16 bits but the first 8 specify the character and the latter specify which encoding system it uses.
No, wchar-t is a Unicode character.

And lastly, the Windows designers decided to make Unicode the default as it can handle the most common different character sets. If you want to use UTF-8 then you need to use the conversion functions, see https://msdn.microsoft.com/en-gb/library/windows/desktop/dd374130(v=vs.85).aspx.
JONLIN 13-Dec-15 11:23am    
Thanks but yet I'm sorry becouse no you didnt really tell what the 'L' specifier means, rathar you said "You also need to use the L prefix on all your character and string literals so the compiler generates the correct characters". Personly I take that as "you'll get wierd results without the 'L' prefix" and I do get wierd results without it and I wasnt (and still isnt) if the prefix means 'long int'. So the question stands still. What do the 'L' prefix stand for?
But I can't complain too much and I thank you for the rest of the answers makes a lot of sense :)
aaaaand last but not least since you said UNICODE was the windows standard so I tested the code but without the setlocale function, it worked perfectly fine so why is it usefull?
Richard MacCutchan 13-Dec-15 11:38am    
My apologies, my comments were not clear.

The L prefix on a character or string literal tells the compiler to generate 16-bit Unicode characters for that literal, as opposed to 8-bit ASCII, which is the C/C++ default. The prefix does not mean long int, rather long char.

As to the standard, I mean that Unicode is the standard for the wchar_t (16-bit character) type.

And finally, setlocale does not affect the formatting of characters but does affect things like number formatting data & time: see https://msdn.microsoft.com/en-us/library/x99tb11d.aspx.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900