How to convert csv file conversion in VS2008

Question

1.00/5 (1 vote)

See more:

Actually, I use CSV file for one of my projects which contains Japanese,Chinese,English and Spanish languages. But if I use that file in my project, it read buffer values even if my project is in unicode.

What I have tried:

I used ini file to read those languages which works fine but only for unicode. As far as multi-byte is concerned only English and Spanish are read correctly.

Posted 24-Jan-18 21:43pm

Member 13323088

Updated 25-Jan-18 1:33am

Add a Solution

Comments

Jochen Arndt 25-Jan-18 4:33am

You should tell us wich encoding is used in the CSV files and how you read them and convert the encoding.

Chinese and Japanese code pages are real multibyte (more than one byte per character) while Latin languages are ANSI code pages (single byte). When handling real multibyte, there are only a few API functions that must be used. Maybe your code does not do that. However, you should always create Unicode applications nowdays and convert input if that is using code page based data.

Member 13323088 25-Jan-18 4:57am

I have done Unicode mfc project too.. Having this same problem that I could not read Japanese,Chinese,Spanish language from csv file...

Member 13323088 25-Jan-18 5:19am

bool CSVFile::ReadData(CStringArray &arr)
{
// Verify correct mode in debug build
ASSERT(m_nMode == modeRead);

// Read next line
CString sLine;
if (!ReadString(sLine))
return false;

LPCTSTR p = sLine;
int nValue = 0;

// Parse values in this line
while (*p != '\0')
{
CString s; // String to hold this value
if (*p == '"')
{
// Bump past opening quote
p++;

// Parse quoted value
while (*p != '\0')
{
// Test for quote character
if (*p == '"')
{
// Found one quote
p++;

// If pair of quotes, keep one
// Else interpret as end of value
if (*p != '"')
{
p++;
break;
}
}

// Add this character to value
s.AppendChar(*p++);
}
}
else
{
// Parse unquoted value
while (*p != '\0' && *p != ',')
{
s.AppendChar(*p++);
}

// Advance to next character (if not already end of string)
if (*p != '\0')
p++;
}

// Add this string to value array
if (nValue < arr.GetCount())
arr[nValue] = s;
else
arr.Add(s);

nValue++;
}

// Trim off any unused array values
if (arr.GetCount() > nValue)
arr.RemoveAt(nValue, arr.GetCount() - nValue);

// We return true if ReadString() succeeded--even if no values
return true;
}

to read the csv file.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Jochen Arndt · Answer 1 · 2018-01-25T01:33:00

All the follwing assumes that your application is a Unicode build.

It looks like you have derived your CSVFile class from CStdioFile. Then ReadString() will convert multi byte file content to wide char using the actually selected code page of the user running your application. That will lead to wrong results when the CSV file has been created with a different encoding.

The main problem is to know which encoding / code page has been used to create the CSV file. If you have control on that, I suggest to use UTF-8 if possible.

Once you know the code page, read the file content into a char buffer (e.g. using fgets()) and call the MultiByteToWideChar function (Windows)[^] to convert to Unicode.