|
Reference this article:
[^]
The Complete Guilde to C++ string, Part I
It is dated 2002 and is now 14 years old. Still helpful. Has Windows and Win 7 shifted completely to Unicode. Do I really need to be concerned with MBCS? Is use of the TCHAR recommended?
Any additional suggested articles about character encoding?
Thank you for your time
If you work with telemetry, please check this bulletin board: www.irigbb.com
modified 25-Feb-15 16:46pm.
|
|
|
|
|
1. All current versions of Windows fully support Unicode. You should use ANSI functions only if your code needs to run on the Windows 95/98/Me series. In standard C++, this typically means using std::wstring rather than std::string.
2. Your UI should definitely be in Unicode. This makes translating your code to run in a different language much easier. However, Internationalization (I18n) and Localization (L10n) are separate topics.
3. Your text data storage should use UTF-8 encoding or something similar. Not only will this save storage for the common (in the Americas and Europe) case of Latin characters, but it is a well-defined coding that is portable across any display language that you are likely to use.
If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack.
--Winston Churchill
|
|
|
|
|
Hello Daniel,
Then that is the way I am going. I found a 2012 article here on code project titled
Quote: What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)?
and will be using it for my first reference and learning tool.
Nice quote in your siggie.
Thank you for your time
If you work with telemetry, please check this bulletin board: www.irigbb.com
|
|
|
|
|
Additional reading is not yielding a good conclusion.
My application is in telemetry. Cutting this to an absolute minimum, I use Excel VBA to build a text based file containing as many as 100,000 pieces of information. My applications uses that to configure itself and determine how to translate the raw input into parameters that another application displays in real time. The application can write copious amounts to text base log files so I can understand the data better and see how it runs. The only human interaction is to start the app, select a configuration file, and use checkboxes to set logging options.
Everything is currently running as Unicode in Visual Studio. The app will never be used by the general public. There is no expectation of translation to other languages. But, I do want to write in a style that will be useful in other projects.
Am I OK with Unicode and strings such as L"read this"? Do I need to use the UTF-8 options?
Thank you for your time
|
|
|
|
|
Given your constraints (no public release, no translation to other languages), using Unicode is not necessary. The ANSI functions are a tiny bit slower (they must convert all string data to/from Unicode), but that is not relevant to your case.
I still believe that for new programs, Unicode is the correct way to go for UI. Among other reasons, Microsoft is slowly "deprecating" its MBCS (multi-byte character set) support - in recent versions of Visual Studio, the MBCS library was a separate download!
As for the data processing, that depends on the input and output formats. If your input is ASCII (alphanumerics, punctuation, CR/LF), and the output is the same, there is no need or reason to convert it to Unicode for processing.
Just as a (very) short example, this coding style is perfectly valid:
#define UNICODE // defined when you set the Windows functions to Unicode-style in VS
#include <windows.h>
#include <stdio.h>
void foo(void)
{
FILE* fp = fopen( "bar", "rb" );
int c;
while ((c = getc(fp) != EOF)
{
if ( c == '\x42' )
MessageBox( NULL, L"Telemetry", L"Bad input", MB_OK );
}
}
Note that I am using char functions to read the data, but Unicode (wide char) functions for the UI.
If you must force a Windows API to be char-based (ANSI), use the name with an 'A' suffix (e.g. MessageBoxA instead of MessageBox). If you must force it to be wide char-based (Unicode), use a 'W' suffix. This, of course, only applies to APIs that have string / character parameters.
If you need to convert between Unicode and ASCII (or UTF-8), the best way to do so is using the WideCharToMultiByte() / MultiByteToWideChar() Windows APIs.
I hope that this helps.
If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack.
--Winston Churchill
|
|
|
|
|
bkelly13 wrote: Am I OK with Unicode and strings such as L"read this"? Do I need to use the UTF-8 options? If you make everything Unicode, you should not have any issues. Apart from perhaps converting your text files from ANSI to Unicode when you read them. Either way, Unicode is the best choice for the long term, especially as you may decide to move to Windows Forms/C# in the future.
|
|
|
|
|
Richard MacCutchan wrote: If you make everything Unicode, you should not have any issues.
The OP is processing real-time telemetry, which is (these days) usually char-based. IMO, there is no good reason to convert the telemetry to Unicode before processing - it slows the processing, doubles the storage requirements, and adds nothing to any processing of numeric data.
Similar considerations apply to the output.
If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack.
--Winston Churchill
|
|
|
|
|
I am well aware of what he is doing, and I only added that as a "perhaps". At the end of the day it's his choice.
|
|
|
|
|
I sit corrected.
If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack.
--Winston Churchill
|
|
|
|
|
I stand in ignorance.
|
|
|
|
|
Telemetry data is usually all numbers and all binary. Economy in bandwidth is a primary goal. The only text may be things like software version embedded in some parts. Even then those are treated as binary data and handed off to the display device.
The text part is where I have a "bunch" of Excel code to build configuration files. Some assembly, make that much assembly, is required to translate the vendor telemetry map (describes all the fields of the data) to something directly usable by my application.
When not running in mission mode the app can write copious log files so I can verify what it did and why. Those are all text based for easy reading. Unicode is fine there.
Side note/rant
IRIG (Inter Range Instrumentation Group) defines telemetry standards for all government ranges. A range is a place where things like bombs are dropped and missiles shot. That standard defines bit 1 as being the MSB and bit N being the LSB. It is absolutely backwards so one of tasks of my code it to renumber all the bit fields. But the vendors do not follow the standard anyway. In one telemetry map the LSB is sometimes bit 0 and sometimes bit 1. In almost every word that has bit field definitions they have put a note that says the MSB is numbered as bit 0 or bit 1. They just cannot understand that the need to keep putting that note in there is a not so subtle indicating that they are doing things wrong. Further, they have at least six different formats for describing those bit fields. With 10,000 parameters in a telemetry stream, that becomes a nightmare for writing code to extract the data needed to process the parameters.
End of rant
It appears that when writing text files, Excel VBA code writes Unicode by default. Since Windows is now Unicode based, its seems much better to go with that. I am mostly there, but have not looked at my tokenizer code lately. (Each parameter is written to a text file, one line per parameter and as many as a dozen pieces of data in each line.) This text file must be in text rather than binary because I must be able to read it myself to check for errors.
Other than log files, none of the real time work uses any text operations. I don't care if it takes 10 bytes per character to store the configuration file.
Conclusion
I'll go with Unicode all the way.
Question
What is this deal with this WCHAR in Visual Studio? One of the articles I found said WCHAR is equivalent to wchar_t, then said no more. Ok, but being a guy with sometimes too much self doubt I still wonder: Are they really the same? Is there something subtle I have not noticed? WCHAR stands out a bit more in the declarations, but for advantages, that seems to be about it. Should I go with wchar_t rather than WCHAR?
Thank you for your time
If you work with telemetry, please check this bulletin board: www.irigbb.com
|
|
|
|
|
bkelly13 wrote: What is this deal with this WCHAR If you right click your mouse on any of these types in your source code you can then select "Go to definition", which will bring up the include file where it's defined. You can see that WCHAR is defined in winnt.h as equivalent to wchar_t which is a fundamental type known by the compiler. The definition of WCHAR is required for porting to compilers that do not have that fundamental type (or did not in the days before C++). Use whichever type you are more comfortable with, although using WCHAR tends to give more flexibility if you ever need to port your code to some alternative platform.
|
|
|
|
|
Re: Use whichever type you are more comfortable with, although using WCHAR tends to give more flexibility if you ever need to port your code to some alternative platform.
I have been working with Microsoft VS for a while now and have not gotten out to play with others in a long time. I will go with that and stick with the WCHAR.
Thank you for your time
If you work with telemetry, please check this bulletin board: www.irigbb.com
|
|
|
|
|
Daniel Pfeffer wrote: Richard MacCutchan wrote: If you make everything Unicode, you should not have any issues.
It depends on what you mean by Unicode...
Windows API and UI use UTF-16 (started with Windows-NT 4.0) but if you generate output for a SMTP/email/WEB you must use UTF-8. For UTF-16 you can use CStringW or std::wstring but for UTF-8 CStringA or std::string. UTF-8 is a multibyte string format but it has nothing to do with the old MBCS which depend on codepages.
In this case using CSting depended on the UNICODE define to make the code UTF-16 aware is now out of time and can shoot you in the foot.
Conversions between UTF-16 and UTF-8 can be done with the current MultiByteToWideChar and WideCharToMultiByte. But if you write more general software, do it with the stl:
wstring_convert<codecvt_utf8_utf16<wchar_t>> converter;
The bad thing is that the current C++ Visual Studio editor can't handle utf-8 string literals. It is a Windows application you know...
|
|
|
|
|
Windows XP, soon to be Win 7, Visual Studio 2008, C++
The Connect function wants an LPCTSTR. m_address is declared as WCHAR. This compiles, but is it OK to use the code in the title or is it bad to do in the long term?
My searches for: convert WCHAR to LPCTSTR turn up some things, but nothing that appears to fit. I think my work firewall is filtering things out.
Edit: I have found that it is not ok to use in the short term. When used with SetDlgItemText to put text in a dialog, it does not work.
Starting with a WCHAR m_address[ 32 ];
and the need to call a function that needs LPCTSTR, such as Connect(...) or SetDlgItemText() that wants LPCTSTR, what can be done to m_address to build an LPCTSTR device.
Thank you for your time
If you work with telemetry, please check this bulletin board: www.irigbb.com
modified 24-Feb-15 17:20pm.
|
|
|
|
|
LPCTSTR is a typedef which will be generated as LPCWSTR (which is really WCHAR* ), or LPCSTR (which is really char* ), depending on whether your project defines UNICODE or not. In the Connect call, no conversion is required, the definition is merely telling you that m_address must be a pointer to a string in the appropriate character set. So if your project is generating Unicode it should be something like:
WCHAR m_address[] = L"128.56.22.8";
LPCWSTR m_address = L"128.56.22.8";
BOOL result = Connect(m_address, nPort);
And if non-Unicode
char m_address[] = "128.56.22.8";
LPCSTR m_address = "128.56.22.8";
BOOL result = Connect(m_address, nPort);
And if you wish to cater for the possibility that you may wish to build it for either type
TCHAR m_address[] = TEXT("128.56.22.8");
LPCTSTR m_address = TEXT("128.56.22.8");
BOOL result = Connect(m_address, nPort);
|
|
|
|
|
Re: LPCTSTR is a typedef which will be generated as LPCWSTR (which is really WCHAR*),
I had not realized that. Ok, now I am going to find that codeproject article(s) on the various types of string and study it. Enough of this messing around in the dark about strings.
Thank you Richard.
Thank you for your time
If you work with telemetry, please check this bulletin board: www.irigbb.com
|
|
|
|
|
Happy to help. It's one of those things that once you get your head round it, it seems so simple (I hope).
|
|
|
|
|
I am hoping for that. Enough fumbling around in the dark.
thank you for your help Richard.
Thank you for your time
If you work with telemetry, please check this bulletin board: www.irigbb.com
|
|
|
|
|
Visual Studio, C++, TCP utility
I am creating a utility to handle TCP/IP I/O for a telemetry project. It has a rather high and continuous bandwidth. Four or maybe more instances will be run at one time. It seems to me that now is the time to take this class and put it into a DLL. Is this an appropriate use of DLL? Do you have a favorite article or list of important things to remember when creating my first DLL?
Thank you for your time
If you work with telemetry, please check this bulletin board: www.irigbb.com
|
|
|
|
|
ok, I guess you're continuing your TCP API project from (somewhere below this in the forum)..
looking at this :-
bkelly13 wrote: Four or maybe more instances
I (personally) think that alone is not necessarily a good indicator of when to use a dll - you're going to create 4 instances of a class that implements an API, on the face of it, what is there in this that says that class/code needs to be in a dll ? you program will start, load the dll is if has to, instantiate x copies of the class as required, but wouldn't really care if the code is in a dll or not ...
So... you could say back to me, I need to use a dll :-
(a) so I can implement a plugin or make it easier to change just that API
(b) to encapsulate my API for deployment, particularly where I have multiple programs that all need the same thing, so they each use the same dll ergo code/api
this is only one consideration I guess Brian
|
|
|
|
|
If you want to separate out some of your functionality then a DLL is one way to do it. But unless you are sharing the DLL with other applications then it offers no particular benefit; a static library would be just as useful.
|
|
|
|
|
Replying to both posts:
Eventually I'll make this available to others to use. Then, from my reading, it becomes easier to incorporate into another project when built as a DLL. Until then, my plate is full in developing the message app (that uses the TCP IP project) and the TCP project itself and I don't need that extra step.
I'll leave it is as a static library until I am ready to share.
Thanks for your thoughts.
Thank you for your time
If you work with telemetry, please check this bulletin board: www.irigbb.com
|
|
|
|
|
When choosing to go to a DLL, think whether you (or someone else) will be using these methods in another application, or whether you'd like an expandable interface via DLL based plug-ins.
A common example of DLL usage would be in implementing audio codecs (since any exe that plays audio can/would use it). Additionally, functionality can be expanded/removed from applications via DLLs. Plug-ins can be implemented using DLLs with a standard interface. The application can look for any DLLs with the standard plug-in interface and load what's available.
|
|
|
|
|
I am using CDHtmlDialog for my application and i want to use CHtmlEditView in CDHtmlDialog. I want to make HtmlEditor in DHtmlDialog for which i am using RichEdit control in DHtmlDialog and create CHtmlEditView over that RichEditControl. But When i am running that Html dialog RichEditcontrol is disabling due to Creation of CHtmlEditView.
I want sking in html and with this i want CHTMLView capability for making an editor as html editor in between html skin.
Just like toolbar and other backend UI in html with Editor in middle using HTMLView capability.
Can anyone please help that how i can use ChtmlEditView in DHtmlDialog to create sort of Html Editor.
Its running perfect with CDialog but having problem with DHtmlDialog.
Please help in this Regards.
modified 8-Jan-15 5:19am.
|
|
|
|
|