|
|
This article suggests that all this "=D0=95=D1=80" stuff is what UTF-8-encoded text looks like. It's not. For anyone reading this article, if you don't know what UTF-8 is, I suggest you first go to Wikipedia and learn what it is.
|
|
|
|
|
the res folder is not created.
modify rc files to remove res\\ indirection.
|
|
|
|
|
Dear sir,
Data from external source in format of ANSI,
but I want to convert and save its to UTF8
I use VS.NET2003, anyone can help?
Thank you,
Suwat (thailand)
|
|
|
|
|
Hi,
In VC++, how do i represent menu items in chinese letters.. i am puzzled please someone guide me.
thanks in advance,
Ganesh
Thanks a lot
|
|
|
|
|
when i use ur project fatal error occur and it is not working and this project is really very nice and is of my use ....can u help me for this
Yes U Can ...If U Can ,Dream it , U can do it ...ICAN
|
|
|
|
|
Error 1 fatal error RC1015: cannot open include file 'res\UniTest.rc2'. d:\UTF8Demo\UniTest.rc 176
Just change following in the UniTest.rc:
1,'res\UniTest.rc2' to 'UniTest.rc2'
2,IDR_MAINFRAME ICON DISCARDABLE "res\\UniTest.ico"
to:IDR_MAINFRAME ICON DISCARDABLE "UniTest.ico"
It will well
There are ways to Roman!
_____________________________
There are ways to Roman!
|
|
|
|
|
Hello, thank you for providing this great ut-8 converter.
here is a version that has been modified to work in visual studion 2005.
Not fully tested but. seems to do the trick.
(i am a newbie so, i don't know if there is a working solution 4 this i vs2005,
but hopefully someone else can same some time by not re-invent the wheel...)
<br />
<br />
<br />
<br />
String^ EncodeToUTF8(String^ szSource)<br />
{<br />
unsigned short ch;<br />
<br />
Byte bt1, bt2, bt3, bt4, bt5, bt6;<br />
<br />
int n, nMax = szSource->Length;<br />
<br />
String^ sFinal;<br />
String^ sTemp;<br />
<br />
<br />
for (n = 0; n < nMax; ++n)<br />
{<br />
ch = (unsigned short)szSource[n];<br />
<br />
if (ch == (L'='))<br />
{<br />
sTemp= System::String::Format(L"={0:X}", ch);<br />
<br />
sFinal += sTemp;<br />
}<br />
else if (ch < 128)<br />
{<br />
sFinal += szSource[n];<br />
}<br />
else if (ch <= 2047)<br />
{<br />
bt1 = (Byte)(192 + (ch / 64));<br />
bt2 = (Byte)(128 + (ch % 64));<br />
<br />
sTemp= System::String::Format((L"={0:X}={1:X}"), bt1, bt2);<br />
<br />
sFinal += sTemp;<br />
}<br />
else if (ch <= 65535)<br />
{<br />
bt1 = (Byte)(224 + (ch / 4096));<br />
bt2 = (Byte)(128 + ((ch / 64) % 64));<br />
bt3 = (Byte)(128 + (ch % 64));<br />
<br />
sTemp= System::String::Format((L"={0:X}={1:X}={2:X}"), bt1, bt2, bt3);<br />
<br />
sFinal += sTemp;<br />
}<br />
else if (ch <= 2097151)<br />
{<br />
bt1 = (Byte)(240 + (ch / 262144));<br />
bt2 = (Byte)(128 + ((ch / 4096) % 64));<br />
bt3 = (Byte)(128 + ((ch / 64) % 64));<br />
bt4 = (Byte)(128 + (ch % 64));<br />
<br />
sTemp->Format((L"={0:X}={1:X}={2:X}={3:X}"), bt1, bt2, bt3, bt4);<br />
sFinal += sTemp;<br />
}<br />
else if (ch <=67108863)<br />
{<br />
bt1 = (Byte)(248 + (ch / 16777216));<br />
bt2 = (Byte)(128 + ((ch / 262144) % 64));<br />
bt3 = (Byte)(128 + ((ch / 4096) % 64));<br />
bt4 = (Byte)(128 + ((ch / 64) % 64));<br />
bt5 = (Byte)(128 + (ch % 64));<br />
<br />
sTemp->Format((L"={0:X}={1:X}={2:X}={3:X}={4:X}"), bt1, bt2, bt3, bt4, bt5);<br />
sFinal += sTemp;<br />
}<br />
else if (ch <=2147483647)<br />
{<br />
bt1 = (Byte)(252 + (ch / 1073741824));<br />
bt2 = (Byte)(128 + ((ch / 16777216) % 64));<br />
bt3 = (Byte)(128 + ((ch / 262144) % 64));<br />
bt4 = (Byte)(128 + ((ch / 4096) % 64));<br />
bt5 = (Byte)(128 + ((ch / 64) % 64));<br />
bt6 = (Byte)(128 + (ch % 64));<br />
<br />
sTemp->Format((L"={0:X}={1:X}={2:X}={3:X}={4:X}={5:X}"), bt1, bt2, bt3, bt4, bt5, bt6);<br />
sFinal += sTemp;<br />
}<br />
<br />
}<br />
<br />
return sFinal;<br />
}<br />
<br />
Byte MakeByte(wchar_t ch1, wchar_t ch2)<br />
{<br />
Byte bt1 = 0, bt2 = 0;<br />
<br />
<br />
<br />
switch (ch2)<br />
{<br />
case (L'0'):<br />
bt2 = 0x00;<br />
break;<br />
case (L'1'):<br />
bt2 = 0x01;<br />
break;<br />
case (L'2'):<br />
bt2 = 0x02;<br />
break;<br />
case (L'3'):<br />
bt2 = 0x03;<br />
break;<br />
case (L'4'):<br />
bt2 = 0x04;<br />
break;<br />
case (L'5'):<br />
bt2 = 0x05;<br />
break;<br />
case (L'6'):<br />
bt2 = 0x06;<br />
break;<br />
case (L'7'):<br />
bt2 = 0x07;<br />
break;<br />
case (L'8'):<br />
bt2 = 0x08;<br />
break;<br />
case (L'9'):<br />
bt2 = 0x09;<br />
break;<br />
case (L'A'):<br />
bt2 = 0x0A;<br />
break;<br />
case (L'B'):<br />
bt2 = 0x0B;<br />
break;<br />
case (L'C'):<br />
bt2 = 0x0C;<br />
break;<br />
case (L'D'):<br />
bt2 = 0x0D;<br />
break;<br />
case (L'E'):<br />
bt2 = 0x0E;<br />
break;<br />
case (L'F'):<br />
bt2 = 0x0F;<br />
break;<br />
}<br />
<br />
switch (ch1)<br />
{<br />
case (L'0'):<br />
bt1 = 0x00;<br />
break;<br />
case (L'1'):<br />
bt1 = 0x10;<br />
break;<br />
case (L'2'):<br />
bt1 = 0x20;<br />
break;<br />
case (L'3'):<br />
bt1 = 0x30;<br />
break;<br />
case (L'4'):<br />
bt1 = 0x40;<br />
break;<br />
case (L'5'):<br />
bt1 = 0x50;<br />
break;<br />
case (L'6'):<br />
bt1 = 0x60;<br />
break;<br />
case (L'7'):<br />
bt1 = 0x70;<br />
break;<br />
case (L'8'):<br />
bt1 = 0x80;<br />
break;<br />
case (L'9'):<br />
bt1 = 0x90;<br />
break;<br />
case (L'A'):<br />
bt1 = 0xA0;<br />
break;<br />
case (L'B'):<br />
bt1 = 0xB0;<br />
break;<br />
case (L'C'):<br />
bt1 = 0xC0;<br />
break;<br />
case (L'D'):<br />
bt1 = 0xD0;<br />
break;<br />
case (L'E'):<br />
bt1 = 0xE0;<br />
break;<br />
case (L'F'):<br />
bt1 = 0xF0;<br />
break;<br />
}<br />
<br />
Byte btFinal = bt2 + bt1;<br />
<br />
return btFinal; <br />
<br />
}<br />
<br />
<br />
String^ DecodeFromUTF8(String^ szSource)<br />
{<br />
<br />
int n, nMax = szSource->Length;<br />
unsigned short ch;<br />
<br />
String^ sFinal;<br />
<br />
Byte z, y, x, w, v, u;<br />
<br />
for (n = 0; n < nMax; ++n)<br />
{<br />
ch = (unsigned short)szSource[n];<br />
<br />
if (ch != ('='))<br />
{<br />
sFinal += (wchar_t)ch;<br />
continue;<br />
}<br />
<br />
if (n >= nMax - 2) break;
z = MakeByte(szSource[n+1], szSource[n+2]);<br />
<br />
<br />
if (z < 127)<br />
{<br />
sFinal += (wchar_t)z;<br />
n = n + 2;<br />
}<br />
else if (z >= 192 && z <= 223)<br />
{<br />
if (n >= nMax - 5) break;
y = MakeByte(szSource[n+4], szSource[n+5]);<br />
sFinal += (wchar_t)( (z-192)*64 + (y-128) );<br />
n = n + 5;<br />
}<br />
else if (z >= 224 && z <= 239)<br />
{<br />
if (n >= nMax - 8) break;
y = MakeByte(szSource[n+4], szSource[n+5]);<br />
x = MakeByte(szSource[n+7], szSource[n+8]);<br />
sFinal += (wchar_t)( (z-224)*4096 + (y-128)*64 + (x-128) );<br />
n = n + 8;<br />
}<br />
else if (z >= 240 && z <= 247)<br />
{<br />
if (n >= nMax - 11) break;
y = MakeByte(szSource[n+4], szSource[n+5]);<br />
x = MakeByte(szSource[n+7], szSource[n+8]);<br />
w = MakeByte(szSource[n+10], szSource[n+11]);<br />
sFinal += (wchar_t)( (z-240)*262144 + (y-128)*4096 + (x-128)*64 + (w-128) );<br />
n = n + 11;<br />
}<br />
else if (z >= 248 && z <= 251)<br />
{<br />
if (n >= nMax - 14) break;
y = MakeByte(szSource[n+4], szSource[n+5]);<br />
x = MakeByte(szSource[n+7], szSource[n+8]);<br />
w = MakeByte(szSource[n+10], szSource[n+11]);<br />
v = MakeByte(szSource[n+13], szSource[n+14]);<br />
sFinal += (wchar_t)( (z-248)*16777216 + (y-128)*262144 + (x-128)*4096 + (w-128)*64 + (v-128) );<br />
n = n + 14;<br />
}<br />
else if (z >= 252 && z <= 253)<br />
{<br />
if (n >= nMax - 17) break;
y = MakeByte(szSource[n+4], szSource[n+5]);<br />
x = MakeByte(szSource[n+7], szSource[n+8]);<br />
w = MakeByte(szSource[n+10], szSource[n+11]);<br />
v = MakeByte(szSource[n+13], szSource[n+14]);<br />
u = MakeByte(szSource[n+16], szSource[n+17]);<br />
sFinal += (wchar_t)( (z-252)*1073741824 + (y-128)*16777216 + (x-128)*262144 + (w-128)*4096 + (v-128)*64 + (u-128) );<br />
n = n + 17;<br />
}<br />
<br />
}<br />
<br />
return sFinal;<br />
}
Kristoffer Jansson
|
|
|
|
|
Hai..
I have a problem of reading the UTF8 data from the Web pages (www.bbc.co.uk/hindi). I am using the IHTMLDocument2, IHTMLElement and other interfaces. USing above interfaces i will retrieve the data between the tags of the webpages. If any UTF8 data contains between the tags, i am receiving the ???? symbols instead of actual data. Please give me some suggestion to resolve this bug.
Thanks,
Veera Raghavendra
|
|
|
|
|
Hello
please right click the mouse on client area of internet explorer. then open Encoding submenu. you see several encodings.
how does Windows use an encoding and how it does determine which font to elect?
suppose a font like Tahoma. and suppose we installed arabic and hebrow languages on an english version of Windows. Is all letters for these two languages now stored in Tahoma?
what is encoding vs code pages?
How can I write my program, using Arabic(Windows) encoding (not to use unicode)?
|
|
|
|
|
CString
GetUTF8Value (int inValue)
{
CString theValue;
if (inValue < 0x80) {
theValue = (char) inValue;
}
else {
theValue.Insert (0, (char) (0x80 + (inValue % 0x40)));
if (inValue <= 0x7ff) {
theValue.Insert (0, (char) (0xc0 + (inValue / 0x40)));
}
else {
theValue.Insert (0, (char) (0x80 + ((inValue / 0x40) % 0x40)));
if (inValue <= 0xffff) {
theValue.Insert (0, (char) (0xe0 + (inValue / 0x1000)));
}
else {
theValue.Insert (0, (char) (0x80 + ((inValue / 0x1000) % 0x40)));
if (inValue <= 0x1fffff) {
theValue.Insert (0, (char) (0xf0 + (inValue / 0x40000)));
}
else {
theValue.Insert (0, (char) (0x80 + ((inValue / 0x40000) % 0x40)));
if (inValue <= 0x3ffffff) {
theValue.Insert (0, (char) (0xf8 + (inValue / 0x1000000)));
}
else {
theValue.Insert (0, (char) (0x80 + ((inValue / 0x1000000) % 0x40)));
if (inValue <= 0x7ffffffff) {
theValue.Insert (0, (char) (0xfc + (inValue / 0x40000000)));
}
else {
theValue = "ERROR";
}
}
}
}
}
}
return theValue;
}
|
|
|
|
|
Works great!
Here's a version that makes a "URL-encoded" UTF-8 string.
For example, the Greek letter ψ, which has Unicode code point 968/0x3C8, gets converted into a string "%CF%88".
CString GetUrlEncodedStringUtf8 (int inValue)
{
CString theValue;
if (inValue < 0x80) {
//theValue = (char) inValue;
theValue.Insert(0, L"%" + decToHex(inValue, 16));
}
else {
//theValue.Insert (0, (char) (0x80 + (inValue % 0x40)));
//theValue.Insert(0, L"%");
theValue.Insert(0, L"%" + decToHex(0x80 + (inValue % 0x40), 16));
if (inValue <= 0x7ff) {
theValue.Insert(0, L"%" + decToHex(0xc0 + (inValue / 0x40), 16));
}
else {
theValue.Insert(0, L"%" + decToHex(0x80 + ((inValue / 0x40) % 0x40), 16));
if (inValue <= 0xffff) {
theValue.Insert(0, L"%" + decToHex(0xe0 + (inValue / 0x1000), 16));
}
else {
theValue.Insert(0, L"%" + decToHex(0x80 + ((inValue / 0x1000) % 0x40), 16));
if (inValue <= 0x1fffff) {
theValue.Insert(0, L"%" + decToHex(0xf0 + (inValue / 0x40000), 16));
}
else {
theValue.Insert(0, L"%" + decToHex(0x80 + ((inValue / 0x40000) % 0x40), 16));
if (inValue <= 0x3ffffff) {
theValue.Insert(0, L"%" + decToHex(0xf8 + (inValue / 0x1000000), 16));
}
else {
theValue.Insert(0, L"%" + decToHex(0x80 + ((inValue / 0x1000000) % 0x40), 16));
if (inValue <= 0x7ffffffff) {
theValue.Insert(0, L"%" + decToHex(0xfc + (inValue / 0x40000000), 16));
}
else {
theValue = "ERROR";
}
}
}
}
}
}
return theValue;
}
// Written 2002 by ChandraSekar Vuppalapati
// GENERATES A HEX REPRESENTATION OF GIVEN CHARACTER
CString decToHex(char num, int radix)
{
char hexVals[16] = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
int temp=0;
CString csTmp;
int num_char;
num_char = (int) num;
// ISO-8859-1
// IF THE IF LOOP IS COMMENTED, THE CODE WILL FAIL TO GENERATE A
// PROPER URL ENCODE FOR THE CHARACTERS WHOSE RANGE IN 127-255(DECIMAL)
if (num_char < 0)
num_char = 256 + num_char;
while (num_char >= radix)
{
temp = num_char % radix;
num_char = (int)floor( ((double) num_char) / radix);
csTmp = hexVals[temp];
}
csTmp += hexVals[num_char];
if(csTmp.GetLength() < 2)
{
csTmp += '0';
}
CString strdecToHex(csTmp);
// Reverse the String
strdecToHex.MakeReverse();
return strdecToHex;
}
|
|
|
|
|
Hi all
Can anyone suggest me how to incorporate vietnamese characters on a label i.e on one label the text is in english on the other i want the equivalent vietnames characters for it
in urgent need
thanks in advance
|
|
|
|
|
Are you sure this sample decode UTF8 to ANSI.
I cannot understand your code and it doesn't work. It decodes absolutly nothing.
UTF-8 is like éà è$aej for éàè$aej
and nothing to do with your =XXXX
Could you help
Thanks,
Dominique
|
|
|
|
|
Hi Dominique,
I needed to decode UTF-8 myself and looks like I have something you might want. If you found a solution please let me know what you found.
I modified the code in this article. You only need one function. I needed this function to decode accented charecaters returned in an XML response from a webservice. I could not test all possibilities. The decoding I needed was for the two byte sequences like you pointed out.
CString DecodeFromUTF8(LPCTSTR szSource)
{
int n, nMax = _tcslen(szSource);
CString sFinal;
BYTE z, y, x, w, v, u;
for (n = 0; n < nMax; ++n)
{
z = szSource[n];
if (z < 127)
{
sFinal += (TCHAR)z;
}
else if (z >= 192 && z <= 223)
{
if (n >= nMax - 1)
break;
y = szSource[n+1];
sFinal += (TCHAR)( (z-192)*64 + (y-128) );
n = n + 1;
}
else if (z >= 224 && z <= 239)
{
if (n >= nMax - 2)
break;
y = szSource[n+1];
x = szSource[n+2];
sFinal += (TCHAR)( (z-224)*4096 + (y-128)*64 + (x-128) );
n = n + 2;
}
else if (z >= 240 && z <= 247)
{
if (n >= nMax - 3)
break;
y = szSource[n+1];
x = szSource[n+2];
w = szSource[n+3];
sFinal += (TCHAR)( (z-240)*262144 + (y-128)*4096 +
(x-128)*64 + (w-128) );
n = n + 3;
}
else if (z >= 248 && z <= 251)
{
if (n >= nMax - 4)
break;
y = szSource[n+1];
x = szSource[n+2];
w = szSource[n+3];
v = szSource[n+4];
sFinal += (TCHAR)( (z-248)*16777216 + (y-128)*262144 +
(x-128)*4096 + (w-128)*64 + (v-128) );
n = n + 4;
}
else if (z >= 252 && z <= 253)
{
if (n >= nMax - 5)
break;
y = szSource[n+1];
x = szSource[n+2];
w = szSource[n+3];
v = szSource[n+4];
u = szSource[n+5];
sFinal += (TCHAR)( (z-252)*1073741824 + (y-128)*16777216 +
(x-128)*262144 + (w-128)*4096 + (v-128)*64 + (u-128) );
n = n + 5;
}
}
return sFinal;
}
|
|
|
|
|
In fact very simple.
Do this
// Convert the string from UTF8 to string
wstring dest;
string src(s);
utf8decode(dest, src);
return dest.c_str();
s is a CString.
.cpp
#include "stdafx.h"
#include "utf8.h"
#ifdef _DEBUG
#define new DEBUG_NEW
#undef THIS_FILE
static char THIS_FILE[] = __FILE__;
#endif
void wstring2string(string &dest,const wstring &src)
{
dest.resize(src.size());
for (int i=0; i
|
|
|
|
|
In fact very simple.
Do this
// Convert the string from UTF8 to string
wstring dest;
string src(s);
utf8decode(dest, src);
return dest.c_str();
s is a CString.
.cpp
#include "stdafx.h"
#include "utf8.h"
#ifdef _DEBUG
#define new DEBUG_NEW
#undef THIS_FILE
static char THIS_FILE[] = __FILE__;
#endif
void wstring2string(string &dest,const wstring &src)
{
dest.resize(src.size());
for (int i=0; i
|
|
|
|
|
Sorry, but the source you referenced, http://www1.tip.nl/~t876506/utf8tbl.html has an incorrect definition of UTF-8. UTF-8 is at most 4 bytes, and the reference to 6 bytes is very old misinformation. You won't notice the problem unless you use some of the more exotic and recently-added-to-unicode characters, but it is definitely a problem. Refer to the unicode.org site and the online version of the standard for more information. The site also has source code for a conversion program and you should test and compare results, if you are going to implement your own algorithms for encoding, decoding or conversion.
tex
|
|
|
|
|
Hello tex,
Can u send me the exact place(link) where i can find the source code for encoding and decoding inti/from utf8 ?
KK
|
|
|
|
|
Sure. http://www.unicode.org/Public/PROGRAMS/CVTUTF/
This converts utf-8, utf-16, utf-32.
You can read the definition of utf-8 in the standard, it is online at www.unicode.org.
I noticed one of the FAQs on the site also points at utf-8 examples that can be used for testing.
There is also a Unicode-example page on my website and a zip of utf-8 data-
See http://www.i18nguy.com/unicode/unicode-example-intro.html
tex
|
|
|
|
|
I think - You should learn what is UTF-8. Visit www.unicode.org, You may also take a look at ftp.unicode.org and see simple UTF-8 encoder/decoder.
Try to google for "quoted printable"
http://www.freesoft.org/CIE/RFC/1521/6.htm
and
"UTF-8"
http://www.cl.cam.ac.uk/~mgk25/unicode.html
|
|
|
|
|
Yes, actually, it is QP
|
|
|
|
|
try this macro, it works!
#define UTF8_CHAR_LEN( byte ) (( 0xE5000000 >> (( byte >> 3 ) & 0x1e )) & 3 ) + 1
|
|
|
|
|
|
The easiest way to convert wide characters to UTF8 (on Windows, at least) would be WideCharToMultiByte. With all the hard work done by Windows, it's a simple matter to convert the UTF8 string to your "=XX" format.
|
|
|
|
|