How do I pass non-english characters from char* to whcar_t* on English OS

Question

0.00/5 (No votes)

See more:

Code Snippet:

int Convertchar_wchar(char* pData, int pDataLength)
{
    wchar_t wcsQuery[4096*2 + 2];
    memcpy((void*)wcsQuery, pData, pDataLength)
    wcout << wcsQuery << endl;
}

I am trying to execute the code for multilingual support. Hence I need to handle non-english languages on English OS(here Win2K3). Now the problem is when I pass any non english characters(I have tried with Japanese) instead of passing the non-english characters it is converting it to ?????. I have confirmed its not display problem but when memcpy is called the value inside is getting changed. The value being passed through char* pData is in the form of UNICODE value, still it is converting into wrong values.

Can some one help me to understand why memcpy is converting the value? Does memcpy internally uses default code page value? How can I pass the correct value to the wide char pointer?

I have already tried wcscpy, RtlCopyMemory. I am not sure what Code Page to pass if MultiByteToWideChar is used such that it will support all the languages.

Waiting for some input ASAP.

Thanks

Posted 16-Aug-10 17:30pm

rudrik

Add a Solution

3 solutions

Solution 2

Your starting point seems incorrect; you cannot have Unicode characters in an array defined as char* pData. This is (at best) multibyte data so doing a straight memcpy() to a wchar_t array still leaves you with multibyte characters. As Superman mentioned you need to convert it to Unicode (assuming that is what you are trying to achieve).

However I suspect the basic problem is your use of wcout to display the characters. This stream accepts Unicode characters; however you are not passing Unicode characters so your data gets converted to garbage. Try the following:

int Convertchar_wchar(char* pData, int pDataLength)
{
    cout << pData << endl;
}

// or if you want to do the conversion

int Convertchar_wchar(char* pData, int pDataLength)
{
    wchar_t wcsQuery = new wchar_t[pDataLength + 1];
    MultiByteToWideChar(CP_UTF8, 0, pData, pDataLength, wcsQuery, pDataLength + 1);
    wcout << wcsQuery << endl;
    delete [] wcsQuery; // Don't forget to deallocate the buffer!
}

Posted 16-Aug-10 22:56pm

Richard MacCutchan

Updated 16-Aug-10 23:27pm

v2

Solution 3

The thing is that its a legacy code which i need to enhance. The data is actually coming was unicode from a java application over the wire.

Till now unicode work good for Japanese data on Japanese OS or Chinese data on Chinese OS. The problem I am facing is to support this languages in English OS. As far as i understand memcpy should just copy the data and it should not matter what kind of data it is. But my understanding stood wrong practically.

Hence i need to know, does memcpy internally uses default code page value?

Also, the data coming over the wire will not be UTF8 always. In case i will need to determine before hand the type of data, can you help how i can determine that?

Posted 17-Aug-10 0:27am

rudrik

Comments

Richard MacCutchan 17-Aug-10 7:07am

memcpy does what it says on the tin; it copies memory byte for byte, it does not know, nor does it care what the content is. If you need to convert characters from WCHAR to MBCS or vice versa then you need to know in advance what format the source is and choose the appropriate conversion method, and the correct code page. Check the MSDN entries for the conversion functions mentioned in the previous answers for ful details.

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

«_Superman_» · Accepted Answer · 2010-08-16T18:48:00

Solution 1

If you're dealing with UNICODE characters, you should be using wchar_t buffers from the very beginning itself. You shouldn't have the need to make a conversion.
If it cannot be avoided you should do this -

MultiByteToWideChar(CP_UTF8, 0, charArray, -1, wcharArray, wcharArrayLen);

Posted 16-Aug-10 18:48pm

«_Superman_»