Click here to Skip to main content
15,893,266 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Actually, I use CSV file for one of my projects which contains Japanese,Chinese,English and Spanish languages. But if I use that file in my project, it read buffer values even if my project is in unicode.

What I have tried:

I used ini file to read those languages which works fine but only for unicode. As far as multi-byte is concerned only English and Spanish are read correctly.
Posted
Updated 25-Jan-18 1:33am
Comments
Jochen Arndt 25-Jan-18 4:33am    
You should tell us wich encoding is used in the CSV files and how you read them and convert the encoding.

Chinese and Japanese code pages are real multibyte (more than one byte per character) while Latin languages are ANSI code pages (single byte). When handling real multibyte, there are only a few API functions that must be used. Maybe your code does not do that. However, you should always create Unicode applications nowdays and convert input if that is using code page based data.
Member 13323088 25-Jan-18 4:57am    
I have done Unicode mfc project too.. Having this same problem that I could not read Japanese,Chinese,Spanish language from csv file...
Member 13323088 25-Jan-18 5:19am    
bool CSVFile::ReadData(CStringArray &arr)
{
// Verify correct mode in debug build
ASSERT(m_nMode == modeRead);

// Read next line
CString sLine;
if (!ReadString(sLine))
return false;

LPCTSTR p = sLine;
int nValue = 0;

// Parse values in this line
while (*p != '\0')
{
CString s; // String to hold this value
if (*p == '"')
{
// Bump past opening quote
p++;

// Parse quoted value
while (*p != '\0')
{
// Test for quote character
if (*p == '"')
{
// Found one quote
p++;

// If pair of quotes, keep one
// Else interpret as end of value
if (*p != '"')
{
p++;
break;
}
}

// Add this character to value
s.AppendChar(*p++);
}
}
else
{
// Parse unquoted value
while (*p != '\0' && *p != ',')
{
s.AppendChar(*p++);
}

// Advance to next character (if not already end of string)
if (*p != '\0')
p++;
}

// Add this string to value array
if (nValue < arr.GetCount())
arr[nValue] = s;
else
arr.Add(s);

nValue++;
}

// Trim off any unused array values
if (arr.GetCount() > nValue)
arr.RemoveAt(nValue, arr.GetCount() - nValue);

// We return true if ReadString() succeeded--even if no values
return true;
}

to read the csv file.

1 solution

All the follwing assumes that your application is a Unicode build.

It looks like you have derived your CSVFile class from CStdioFile. Then ReadString() will convert multi byte file content to wide char using the actually selected code page of the user running your application. That will lead to wrong results when the CSV file has been created with a different encoding.

The main problem is to know which encoding / code page has been used to create the CSV file. If you have control on that, I suggest to use UTF-8 if possible.

Once you know the code page, read the file content into a char buffer (e.g. using fgets()) and call the MultiByteToWideChar function (Windows)[^] to convert to Unicode.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900