Click here to Skip to main content
15,887,464 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a strange problem when I parse utf-8 file using expat. The code can parse xml correctly in windows.(I test this usign vc6.0), But it cannot parse xml when i run the code in android. I don't know why, the code and the xml file is the same.
For solving this problem, i did some tests:
1. change xml file 's encoding is ANSI: if file doesn't contain chinese, it can be parsed correctly; if not ,parse failed.
2. change xml file's encoding is UTF-8: whatever,the code cannot parse it.

the following is my code :
C++
#include <fstream>
#include <string>
#include <stdlib.h>
#include <stdio.h>
#include <algorithm>
#include "expat.h"
using namespace std;

bool  LoadXmlFile(string strfilename);

int Depth;
ize_t offs;
int overflow;

// if the element is the one we're after, convert the character data to
// an integer value
static void 
startElement(void *userData, const char *name, const char **atts)
{
  puts(name);
  int *depthPtr = (int *)userData;
  *depthPtr += 1;
}

static void 
endElement(void *userData, const char *name)
{
  puts(name);
  int *depthPtr = (int *)userData;
  *depthPtr -= 1;
  }

bool  LoadXmlFile(string strfilename)
{    
    XML_Parser parser=XML_ParserCreate(NULL);
    if(!parser)
    {
        printf("Create XML praser FAIL\n");
		return false;
    }
    printf("Enter LoadXmlFile\n");	
    m_strxmlFileName = strfilename;
    long size=0;
    FILE* xmlFile=fopen(m_strxmlFileName.c_str(),"rt");
    if(NULL==xmlFile)
    {
        printf("loding file failed\n")
        return false;
    }
    char buf[100];
    int done=0;
    XML_SetUserData(parser, &Depth);
    XML_SetElementHandler(parser, startElement, endElement); 
    XML_SetCharacterDataHandler(parser,CharHandler);

    //reading xml file 
    do
    {
	memset(buf, 0, sizeof(char) * 100);
        int len = (int)fread(buf, 1,sizeof(buf), xmlFile);
        done = len < sizeof(buf);
        printf("buf first chars :%s\n",buf);
        if(!XML_Parse(parser, buf, len, done))
        {
            done=1;
            printf("Parse requestconfig xml FAILED\n");
            fclose(xmlFile);
            return false;
        }
    }while(!done);

    fclose(xmlFile);
} 

int main ()
{
  if(!LoadXmlFile("requestconfig.xml"))
      printf("parse xml failed");    
}

the xml that i use to test is :

HTML
<clientrequest requestid="20" desc="中国">
</clientrequest>

thank you very much.
Posted
Updated 15-Mar-12 1:43am
v2

<clientrequest requestid="20" desc="中国">
</clientrequest>


sorry, the xml i didn't write before
 
Share this answer
 
If the file is UTF-8 then chances are that the first three characters are the UTF-8 BOM sequence. These characters will not be removed when you fread() the data from the file, and I suspect the XML parser will not like them either. See the discussion of how to handle this situation in my tip Handling simple text files in C/C++[^].
 
Share this answer
 
Comments
shimmer8711 15-Mar-12 21:00pm    
you are right, it's not code problem. The xml file's format is wrong, it shoule be UTF-8 NO BOM. thank you

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900