Click here to Skip to main content
15,867,308 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
QUESTION:
How do I write the file contents with a BOM?

__________

Using CodeBlocks 17.12 / GCC 5.1 / Microsoft Windows XP .

From my previous question on page Save file with unicode string in old windows system[^]

"
Quote:
If I want to update some software for my uncle's old saw mill machine that has a programmable controller with C, and I want to code it in C or C++98 or in C++03, nothing newer, then how do make the following work for that?

I am doing a test on an old Windows XP Pro 32 bit system with every (at the time of installation) code page installed, and with CodeBlocks 17.12 and GCC of 5.1 . I think that I should be able to code it for the older system with this. It changes the file name.

If I can do it all in C then that is OK.

I prefer to do this in C, but if 98 or 03 works that is OK also.
"

I can now save a file with a Unicode name "A_Unicode天file.txt" .

__________

But, it saves the simple ASCII text which is within the desired wide string
wstring write_stuff = L"I am Happy";
with added spaces in between each letter.

I have been informed to use a BOM at the start of the file. I tried coding it in like the following but that does not work.

QUESTION:
How do I write the file contents with a BOM?

What I have tried:

wstring write_stuff = L"\xef\xbb\xbf I am Happy";
DWORD dwBytesWritten = 0;

TCHAR *fname=TEXT("A_Unicode_WriteSTuff_天_file.txt");
HANDLE hFile = CreateFile(fname, GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
BOOL bErr = WriteFile( hFile, (LPCVOID)&write_stuff[0], sizeof(wchar_t)*write_stuff.size(), &dwBytesWritten, nullptr);
Posted
Updated 28-Jun-22 8:10am
v3

1 solution

For UTF-8 you code this:
C
DWORD dwBytesWritten = 0;
BOOL bErr = 0;
TCHAR *fname = TEXT("A_Unicode_WriteSTuff_天_file.txt");
HANDLE hFile = CreateFile(fname, GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
unsigned char BOM[3]{ 0xef, 0xbb, 0xbf };
bErr = WriteFile(hFile, (LPCVOID)BOM, (DWORD)sizeof(BOM), &dwBytesWritten, NULL);
bErr = CloseHandle(hFile);
 
Share this answer
 
Comments
Member 15078716 28-Jun-22 0:34am    
I am trying to use what you wrote, and I have been studying it, and I think that I got it close, but I need some help.

Here is my code for now:
DWORD dwBytesWrittenXPP = 0;
BOOL bErrGHJ = 0;
TCHAR *flename = TEXT("utf8_ByteOrderMark.txt");


DWORD Bytes_that_need_To_Write;
DWORD bytes_which_have_written_already;

while (bytes_which_have_written_already< Bytes_that_need_To_Write )
{

    HANDLE hDFile = CreateFile(flename, GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    unsigned char BOM48[3]{ 0xef, 0xbb, 0xbf };

    wstring write_stuff77 = L"XYZ";

    // Convert to two LPCVOID.
    LPCVOID Nf1 = (LPCVOID)BOM48;
    LPCVOID Nf2 = (LPCVOID)&write_stuff77[0];

    // Convert to two wide strings.
    wstring wstrFirst(static_cast<const wchar_t*>(Nf1));
    wstring wstrSecond(static_cast<const wchar_t*>(Nf2));

    // Concatenate the two wide strings.
    wstring wstrAll = wstrFirst + wstrSecond;

    // Convert combined wstrings to LPCVOID.
    LPCVOID LpAll = (LPCVOID)&wstrAll[0];

    wchar_t need_to_write_Bytes = wstrAll.size() ;
    Bytes_that_need_To_Write = sizeof (need_to_write_Bytes ) ;

    bErrGHJ = WriteFile(hDFile, LpAll, Bytes_that_need_To_Write - bytes_which_have_written_already, bytes_which_have_written_already, NULL);
)

bErrGHJ = CloseHandle(hDFile);


For that almost last line (bErrGHJ = WriteFile(hDFile, ...) I get:
error: invalid conversion from 'DWORD {aka long unsigned int}' to 'PDWORD {aka long unsigned int*}'
Shao Voon Wong 28-Jun-22 3:17am    
Put ampersand on the second last parameter.
bErrGHJ = WriteFile(hDFile, LpAll, Bytes_that_need_To_Write - bytes_which_have_written_already, &bytes_which_have_written_already, NULL);
Member 15078716 28-Jun-22 14:39pm    
When I use
Copy Code
bErrGHJ = WriteFile(hDFile, LpAll, Bytes_that_need_To_Write - bytes_which_have_written_already, &bytes_which_have_written_already, NULL);

My program will not close and I have to force it closed with control alt delete.

Just one change, adding that & did something. I had a DWORD before and changed it to PDWORD and then I cannot close the program normally.

And now I noticed that it also uses 100% CPU constantly even before I try to close it.

I think it is in a continuous loop. I do no understand how bytes_which_have_written_already is calculated. I am not certain how to measure this or how to code this in

__________

I fixed some things, and forced the loop to stop. But now the file has "ï»" in it instead of "XYZ".

I think that this is getting closer, but it is still not working.

Here is my current code:

//Create a file that has a BOM for utf-8
DWORD dwBytesWrittenXPP = 0;
BOOL bErrGHJ = 0;
TCHAR *flename = TEXT("utf8_ByteOrderMark.txt");


DWORD Bytes_that_need_To_Write;
DWORD bytes_which_have_written_already = 0;

int CountIt = 0;

while (bytes_which_have_written_already< Bytes_that_need_To_Write )
    {

        if (CountIt > 50)   // To stop some loop until I can fix it.
            {
                break;
            }

        HANDLE hDFile = CreateFile(flename, GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
        unsigned char BOM48[3]{ 0xef, 0xbb, 0xbf };

        wstring write_stuff77 = L"XYZ";

        // Convert to two LPCVOID.
        LPCVOID Nf1 = (LPCVOID)BOM48;
        LPCVOID Nf2 = (LPCVOID)&write_stuff77[0];

        // Convert to two wide strings.
        wstring wstrFirst(static_cast<const wchar_t*>(Nf1));
        wstring wstrSecond(static_cast<const wchar_t*>(Nf2));

        // Concatenate the two wide strings.
        wstring wstrAll = wstrFirst + wstrSecond;

        // Convert combined wstrings to LPCVOID.
        LPCVOID LpAll = (LPCVOID)&wstrAll[0];

        wchar_t need_to_write_Bytes = wstrAll.size() ;
        Bytes_that_need_To_Write = sizeof (need_to_write_Bytes ) ;

        
        bErrGHJ = WriteFile(hDFile, LpAll, Bytes_that_need_To_Write - bytes_which_have_written_already, &bytes_which_have_written_already, NULL);

        bErrGHJ = CloseHandle(hDFile);

        CountIt = CountIt + 1;


    }
merano99 28-Jun-22 17:13pm    
I Think converting BOM48 to wstring will not work at all.
   HANDLE hDFile = CreateFile(flename, GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    unsigned char BOM48[3]{ 0xef, 0xbb, 0xbf };
    bErrGHJ = WriteFile(hDFile, (LPCVOID)BOM48, (DWORD)sizeof(BOM48), &dwBytesWrittenXPP, NULL);

    do {
        wstring write_stuff77 = L"XYZ";

        // Convert to two LPCVOID.
        // LPCVOID Nf2 = (LPCVOID)&write_stuff77[0];  NO!!
        LPCVOID LpAll = (LPCVOID)write_stuff77.c_str();

        // wstring wstrFirst(static_cast<const wchar_t*>(Nf1)); !!JUST DONT DO THIS!

        Bytes_that_need_To_Write = write_stuff77.size() * sizeof(write_stuff77[0]);

        bErrGHJ = WriteFile(hDFile, LpAll, Bytes_that_need_To_Write, &dwBytesWrittenXPP, NULL);

        bytes_which_have_written_already += dwBytesWrittenXPP;

    } while (bytes_which_have_written_already < Bytes_that_need_To_Write);

    bErrGHJ = CloseHandle(hDFile);
Member 15078716 28-Jun-22 23:45pm    
I tried that, it did not work.
I think that I should try to simplify this so that I might understand it better.
I put together what I am being told into the following which again "almost works".

You will notice that I hard coded values in. That is to make certain that the desired values are getting to the correct place in the code. If I should not do this, then please tell me.



HANDLE hDFile = CreateFile(L"utf8_UsingByteOrderMark_C_天堂.txt", GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

DWORD NumberOfBytesWritten;

WriteFile(hDFile, L"\xef\xbb\xbf - hello - J - こんにちは - abcdefghijklmnopqrstuvwxyz", 100, &NumberOfBytesWritten, NULL);



The file name saves fine. Thank you so much.

Inside of the file is the following:


i ^^ ? - h e l l o - J - S0“0k0a0o0 - a b c d e f g h i j k l m n o p q r s t u v w x

The size 100 is just a test to see how many bytes that is in the output to the file.The unicode text in the file seems to be not reading the BOM as an actual BOM, just a bunch of characters, and the text こんにちは becomes S0“0k0a0o0 .

Almost, but not there yet.

Again, thank you *ALL* for helping me.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900