I am using libxml2 in my VS2010 project to generate tree from HTML, find some nodes, modify it and dump tree back to HTML.
The main logic is:
htmlParserCtxtPtr parser = htmlCreatePushParserCtxt(NULL, NULL, NULL, 0, NULL, XML_CHAR_ENCODING_UTF8);
htmlCtxtUseOptions(parser, HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
htmlParseChunk(parser, pData, dataLen, 0);
xmlNode* node = xmlDocGetRootElement(parser->myDoc);
htmlDocDumpMemory(parser->myDoc, &newHtml, &len);
When I use HTML from
http://www.youtube.com/watch?v=S77UrnEGs_g[
^] as input data I get one excess in output.
I have checked above URL on
http://validator.w3.org/[
^] and get error:
Line 562, Column 31: Unclosed element div.
<div class="content">
My question is: could I configure libxml2 so it would not automatically close unclosed tags?