My codecvt (a.k.a. facet) never gets called.

Question

5.00/5 (9 votes)

See more:

The problem is that utf16_codecvt methods never get called and, therefore, the result is wrong. I have search the net, but all I can find is examples of what is supposed to work. Unfortunately none of them has worked. I have also seen other posters, on the net, with the same problem, but no one gave them and answer to it.

I have tested to make sure that it has the facet (utf16_codecvt) and it does. So I see no reason why its virtual methods are never called. Instead it keeps calling the codecvt<wchar_t,char, mbstate> methods.

Any ideas?

C++

class utf16_codecvt : public std::codecvt<char16_t, char16_t, std::mbstate_t>
{
    ...//
};

void MyTestFunc()
{
    ... //
    std::wifstream myFile;
    std::locale myLoc = std::locale(myFile.getloc(), new utf16_codecvt);
    myFile.imbue(myLoc);
    myFile.open(pFileName, std::ios::in | std::ios::binary);
    ... //
    myFile.read(bom_buffer, 1);
    ... //
}

The following link gives an example of the types of things I am trying to do:
April 01, 1999 - Unicode Files - P.J. Plauger http://www.ddj.com/cpp/184403638?pgno=1[^]

Posted 27-Jun-09 9:35am

John R. Shaw

Updated 26-Nov-09 10:56am

Gordon Brandly

v2

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Stuart Dootson · Accepted Answer · 2009-06-28T23:37:00

From what I can tell, the C++ stream system presumes that files are sequences of bytes, not characters - even when you use wide streams - the 'wide' part of wide stream (AFAICT) indicates how the stream object interacts with C++, not the underlying file or whatever. Thus, your codecvt facet has to take in characters.

By changing the declaration of your codecvt facet to that shown below, I was able to get breakpoints in the replacement facet being set.

C++

class utf16_codecvt : public std::codecvt<char16_t, char, std::mbstate_t>
{
   typedef std::codecvt<char16_t, char, std::mbstate_t> Base;
   typedef char16_t ElemT;
   typedef char ByteT;
   virtual result __CLR_OR_THIS_CALL do_in(std::mbstate_t& s,
      const ByteT *_First1, const ByteT *_Last1, const ByteT *& _Mid1,
      ElemT*_First2, ElemT* _Last2, ElemT *& _Mid2) const
   {	// convert bytes [_First1, _Last1) to [_First2, _Last)
      return Base::do_in(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
   }

   virtual result __CLR_OR_THIS_CALL do_out(std::mbstate_t& s,
      const ElemT*_First1, const ElemT*_Last1, const ElemT*& _Mid1,
      ByteT*_First2, ByteT*_Last2, ByteT*& _Mid2) const
   {	// convert [_First1, _Last1) to bytes [_First2, _Last)
      return Base::do_out(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
   }

   virtual result __CLR_OR_THIS_CALL do_unshift(std::mbstate_t& s,
      ByteT*_First2, ByteT*_Last2, ByteT*&_Mid2) const
   {	// generate bytes to return to default shift state
      return Base::do_unshift(s, _First2, _Last2, _Mid2);
   }

   virtual int __CLR_OR_THIS_CALL do_length(const std::mbstate_t& s, const ByteT*_First1,
      const ByteT*_Last1, size_t _Count) const
   {	// return min(_Count, converted length of bytes [_First1, _Last1))
      return Base::do_length(s, _First1, _Last1, _Count);
   }
};

So, your replacement facet will have to know it needs two bytes read for every character (and vice versa, obviously). The best reference for that sort of information is probably Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft[^] - but even then, locales and facets are heavy going in C++ :(