Click here to Skip to main content
15,895,142 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have some xml files which look like
XML
<ref-list>
  <title>REFERENCES</title>
  <ref id="ref1">
    <label>[1]</label>
    <mixed-citation publication-type="journal">
      <person-group person-group-type="author">
        <string-name>
          <surname>Angel’skii</surname>, <given-names>O.,V.</given-names></string-name>, <string-name><surname>Ushenko</surname>, <given-names>A.,G.</given-names></string-name>, <string-name><surname>Arkhelyuk</surname>, <given-names>A.,D.</given-names></string-name>, <string-name><surname>Ermolenko</surname>, <given-names>S.,B.</given-names></string-name>, <string-name><surname>Burkovets</surname>, <given-names>D.,N.</given-names></string-name></person-group>, "<article-title>Scattering of laser radiation by multifractal biological structures</article-title>." <source>Optika i Spektroskopiya 88</source> (<issue>3</issue>), <fpage>495</fpage><lpage>498</lpage> (<year>2000</year>).</mixed-citation>
  </ref>
  <ref id="ref2">
    <label>[2]</label>
    <mixed-citation publication-type="journal">
      <person-group person-group-type="author">
        <string-name>
          <surname>Angelsky</surname>, <given-names>O.,V.</given-names></string-name>, <string-name><surname>Maksimyak</surname>, <given-names>P.,P.</given-names></string-name>, <string-name><surname>Hanson</surname>, <given-names>S.,G.</given-names></string-name>, <string-name><surname>Ryukhin</surname>, <given-names>V.,V.</given-names></string-name></person-group>, "<article-title>New Feasibilities for Characterizing Rough Surfaces by Optical-Correlation Techniques</article-title>" <source>Applied Optics</source> (<issue>40</issue>) , pp. <fpage>5693</fpage><lpage>5707</lpage> (<year>2001</year>).</mixed-citation>
  </ref>
  <ref id="ref3">
    <label>[3]</label>
    <mixed-citation publication-type="conf-proc">
      <person-group person-group-type="author">
        <string-name>
          <surname>Ushenko</surname>, <given-names>Yu., A.</given-names></string-name>, <string-name><surname>Dubolazov</surname>, <given-names>A.,V.</given-names></string-name>, <string-name><surname>Karachevtcev</surname>, <given-names>A., O.</given-names></string-name>, <string-name><surname>Zabolotna</surname>, <given-names>N., I.</given-names></string-name></person-group>, "<article-title>A fractal and statistic analysis of Mueller-matrix images of phase inhomogeneous layers</article-title>." <conf-name>Proceedings SPIE</conf-name><volume>8134</volume>, <fpage>P81340P4</fpage> (<year>2011</year>).</mixed-citation>
  </ref>
  <ref id="ref4">
    <label>[4]</label>
    <mixed-citation publication-type="journal">
      <person-group person-group-type="author">
        <string-name>
          <surname>Angelsky</surname>, <given-names>O.,V.</given-names></string-name>, <string-name><surname>Hanson</surname>, <given-names>S., G.</given-names></string-name>, <string-name><surname>Zenkova</surname>, <given-names>C.,Yu.</given-names></string-name>, <string-name><surname>Gorsky</surname>, <given-names>M.,P.</given-names></string-name>, <string-name><surname>Gorodyns’ka</surname>, <given-names>N.,V.</given-names></string-name></person-group>, "<article-title>On polarization metrology (estimation) of the degree of coherence of optical waves</article-title>." <source>Optics Express</source><volume>17</volume>(<issue>18</issue>), pp.<fpage>15623</fpage><lpage>15634</lpage> (<year>2009</year>).</mixed-citation>
  </ref>
  <ref id="ref5">
    <label>[5]</label>
    <mixed-citation publication-type="conf-proc">
      <person-group person-group-type="author">
        <string-name>
          <surname>Ushenko</surname>
          <given-names>O.,G.</given-names>
        </string-name>, <string-name><surname>Dubolazov</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Balanets’ka</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Karachevtsev</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Sydor</surname>, <given-names>M.</given-names></string-name></person-group>, "<article-title>Wavelet analysis for polarization inhomogeneous laser images of blood plasma</article-title>." <conf-name>Proc. SPIE</conf-name><volume>8338</volume>, P. <fpage>83381H</fpage> (<year>2011</year>).</mixed-citation>
  </ref>
  <ref id="ref6">
    <label>[6]</label>
    <mixed-citation publication-type="journal">
      <person-group person-group-type="author">
        <string-name>
          <surname>Ushenko</surname>, <given-names>Yu.,O.</given-names></string-name>, <string-name><surname>Dubolazov</surname>, <given-names>O., V.</given-names></string-name>, <string-name><surname>Karachevtsev</surname>, <given-names>A.,O.</given-names></string-name>, <string-name><surname>Gorsky</surname>, <given-names>M., P.</given-names></string-name>, <string-name><surname>Marchuk</surname>, <given-names>Yu., F.</given-names></string-name></person-group>, "<article-title>Wavelet analysis of Fourier polarized images of the human bile</article-title>." <source>Applied Optics</source> (<issue>51</issue>), P. <fpage>133</fpage><lpage>139</lpage> (<year>2012</year>).</mixed-citation>
  </ref>
  <ref id="ref7">
    <label>[7]</label>
    <mixed-citation publication-type="journal">
      <person-group person-group-type="author">
        <string-name>
          <surname>Angelsky</surname>, <given-names>O.,V.</given-names></string-name>, <string-name><surname>Ushenko</surname>, <given-names>A.,G.</given-names></string-name>, <string-name><surname>Burkovets</surname>, <given-names>D.,N.</given-names></string-name>, <string-name><surname>Ushenko</surname>, <given-names>Y., A.</given-names></string-name></person-group>, "<article-title>Polarization visualization and selection of biotissue image two-layer scattering medium</article-title>." <source>Journal of biomedical optics</source><volume>10</volume>(<issue>1</issue>), P.<fpage>14010</fpage> (<year>2005</year>).</mixed-citation>
  </ref>
  <ref id="ref8">
    <label>[8]</label>
    <mixed-citation publication-type="journal">
      <person-group person-group-type="author">
        <string-name>
          <surname>Angelsky</surname>, <given-names>O.,V.</given-names></string-name>, <string-name><surname>Polyanskii</surname>, <given-names>P.,V.</given-names></string-name>, <string-name><surname>Felde</surname>, <given-names>C.,V.</given-names></string-name></person-group>, "<article-title>The emerging field of correlation optics</article-title>." <source>Optics and Photonics News</source><volume>23</volume>(<issue>4</issue>), p.p.<fpage>25</fpage><lpage>29</lpage> (<year>2012</year>).</mixed-citation>
  </ref>
  <ref id="ref9">
    <label>[9]</label>
    <mixed-citation publication-type="journal">
      <person-group person-group-type="author">
        <string-name>
          <surname>Angelsky</surname>, <given-names>O.,V.</given-names></string-name>, <string-name><surname>Bekshaev</surname>, <given-names>A.,Ya.</given-names></string-name>, <string-name><surname>Maksimyak</surname>, <given-names>P.,P.</given-names></string-name>, <string-name><surname>Maksimyak</surname>, <given-names>A.,P.</given-names></string-name>, Mokhun, <string-name><surname>Hanson</surname>, <given-names>S.,G.</given-names></string-name>, <string-name><surname>Zenkova</surname>, <given-names>C., Yu.</given-names></string-name>, <string-name><surname>Tyurin</surname>, <given-names>A.,V.</given-names></string-name></person-group>, "<article-title>Circular motion of particles suspended in a Gaussian beam with circular polarization validates the spin part of the internal energy flow</article-title>." <source>Optics Express</source><volume>20</volume>(<issue>10</issue>), pp.<fpage>11351</fpage><lpage>11356</lpage> (<year>2012</year>).</mixed-citation>
  </ref>
  <ref id="ref10">
    <label>[10]</label>
    <mixed-citation publication-type="conf-proc">
      <person-group person-group-type="author">
        <string-name>
          <surname>Arkhelyuk</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Podkamen</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Glibka</surname>, <given-names>V.</given-names></string-name></person-group>, "<article-title>Characteristics investigations of surface and volumetrical scattering of the polarised radiation by a layer of oriented particls</article-title>." <conf-name>Proc. SPIE</conf-name><volume>5477</volume>, <fpage>P171</fpage><lpage>176</lpage> (<year>2004</year>).</mixed-citation>
  </ref>
  <ref id="ref11">
    <label>[11]</label>
    <mixed-citation publication-type="journal">
      <person-group person-group-type="author">
        <string-name>
          <surname>Angelsky</surname>, <given-names>O.V.</given-names></string-name>, <string-name><surname>Besaha</surname>, <given-names>R.N.</given-names></string-name>, <string-name><surname>Mokhun</surname>, <given-names>I.I.</given-names></string-name></person-group> "<article-title>Appearance of wavefront dislocations under interference among beams with simple wavefronts</article-title>," <source>Optica Applicata</source><volume>27</volume>(<issue>4</issue>), Pages <fpage>272</fpage><lpage>278</lpage> (<year>1997</year>).</mixed-citation>
  </ref>
  <ref id="ref12">
    <label>[12]</label>
    <mixed-citation publication-type="journal">
      <person-group person-group-type="author">
        <string-name>
          <surname>Angelsky</surname>, <given-names>P., O.</given-names></string-name>, <string-name><surname>Ushenko</surname>, <given-names>A., G.</given-names></string-name>, <string-name><surname>Dubolazov</surname>, <given-names>A., V.</given-names></string-name>, <string-name><surname>Sidor</surname>, <given-names>M., I.</given-names></string-name>, <string-name><surname>Bodnar</surname>, <given-names>G., B.</given-names></string-name>, <string-name><surname>Koval</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Trifonyuk</surname>, <given-names>L.</given-names></string-name></person-group>, "<article-title>The singular approach for processing polarization-inhomogeneous laser images of blood plasma layers</article-title>." <source source-type="IEFF">J. Opt.</source> (<issue>15</issue>), <fpage>044030</fpage> (<year>2013</year>).</mixed-citation>
  </ref>
  <ref id="ref13">
    <label>[13]</label>
    <mixed-citation publication-type="journal">
      <person-group person-group-type="author">
        <string-name>
          <surname>Angelsky</surname>, <given-names>P., O.</given-names></string-name>, <string-name><surname>Ushenko</surname>, <given-names>A., G.</given-names></string-name>, <string-name><surname>Dubolazov</surname>, <given-names>A., V.</given-names></string-name>, <string-name><surname>Sidor</surname>, <given-names>M., I.</given-names></string-name>, <string-name><surname>Bodnar</surname>, <given-names>G., B.</given-names></string-name>, <string-name><surname>Koval</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Trifonyuk</surname>, <given-names>L.</given-names></string-name></person-group>, "<chapter-title>The singular approach for processing polarization-inhomogeneous laser images of blood plasma layers</chapter-title>." J. Opt. (<issue>15</issue>), <fpage>044030</fpage> (<year>2013</year>).</mixed-citation>
  </ref>
</ref-list>

I want to search every single node by the name mixed-citation in the file and check whether there are any child nodes of the mixed-citation node by the name article-title or chapter-title, if found then check whether there is another child node by the name source in the same mixed-citation, if no source is found, get its parent node ref and write it down in some log file.

What I have tried:

C#
public static void Main(string[] args)
{
	string path=@"D:\test\xml";
	var files=Directory.GetFiles(path,"*.xml");
	foreach (var file in files)
	{
		string content=File.ReadAllText(file);
		var doc=TryParseXDocument("<root>"+content+"</root>");
		if (doc!=null)
		{
			var refIds=doc.Descendants("mixed-citation");
			try
			{
				foreach (var refId in refIds)
				{
					if (refId.Descendants("article-title").Any()||refId.Descendants("chapter-title").Any())
					{
						if (refId.Descendants("source").Count()<1)
						{
							string filePath = path + @"\Error.log";
							using (StreamWriter writer = new StreamWriter(filePath, true))
							{
								writer.WriteLine(file+"\r\n---------------------------------------------------\r\nCheck "+refId.Parent.Attribute("id")+" for possible missing <source>");
							}
						}
					};
				}
			}
			catch
			{
			}
			
			
		}
		else
		{
			string filePath = path + @"\InvalidXML.log";
			using (StreamWriter writer = new StreamWriter(filePath, true))
			{
				writer.WriteLine(file + ":\r\n---------------------------------------------------\r\nInvalid XML file, check Tidy parsing and invalid entities");
			}
		}
	}
	
	
	Console.ReadLine();
}
public static XDocument TryParseXDocument(string xmlContent)
{
	
	try
	{
		return XDocument.Parse(xmlContent);
	}
	
	catch(Exception)
	{
		return null;
	}
	
}

Now the problem is I'm getting the texts in the Error.log file in the format
D:\test\xml\ref.xml
---------------------------------------------------
Check id="ref3" for possible missing <source>
D:\test\xml\ref.xml
---------------------------------------------------
Check id="ref5" for possible missing <source>
D:\test\xml\ref.xml
---------------------------------------------------
Check id="ref10" for possible missing <source>
D:\test\xml\ref.xml
---------------------------------------------------
Check id="ref13" for possible missing <source>

Whereas I want it to be like
D:\test\xml\ref.xml
---------------------------------------------------
Check id="ref3" for possible missing <source>
Check id="ref5" for possible missing <source>
Check id="ref10" for possible missing <source>
Check id="ref13" for possible missing <source>

How do I do this?
Also how can the code be optimized?
Posted
Updated 6-Apr-18 20:22pm

1 solution

Why not collect the errors into a stringbuilder and then write the data in the end.

Something like
C#
string path = @"D:\test\xml";
System.Text.StringBuilder errors;
bool hasErrors;

var files = System.IO.Directory.GetFiles(path, "*.xml");
foreach (var file in files) {
   hasErrors = false;
   errors = new System.Text.StringBuilder();
   errors.AppendLine($"File: {file}");
   errors.AppendLine("---------------------------------------");
   string content = System.IO.File.ReadAllText(file);
   var doc = TryParseXDocument("<root>" + content + "</root>");
   if (doc != null) {
      var refIds = doc.Descendants("mixed-citation");
      try {
         foreach (var refId in refIds) {
            if (refId.Descendants("article-title").Any() || refId.Descendants("chapter-title").Any()) {
               if (refId.Descendants("source").Count() < 1) {
                  hasErrors = true;
                  errors.AppendLine($"Check {refId.Parent.Attribute("id")}for possible missing <source> ");
               }
            };
         }
      } catch {
      }
   } else {
      string filePath = path + @"\InvalidXML.log";
      using (System.IO.StreamWriter writer = new System.IO.StreamWriter(filePath, true)) {
         writer.WriteLine(file + ":\r\n---------------------------------------------------\r\nInvalid XML file, check Tidy parsing and invalid entities");
      }
   }
   if (hasErrors) {
      using (System.IO.StreamWriter errorFile = new System.IO.StreamWriter(path + @"\Error.log", true)) {
         errorFile.WriteLine(errors.ToString());
      }
   }
}


Console.ReadLine();
 
Share this answer
 
v2
Comments
Member 12692000 7-Apr-18 11:23am    
Thanks...btw can the code inside the last foreach loop i.e. foreach (var refId in refIds) be optimized or made better?
Wendelius 7-Apr-18 13:30pm    
It depends on the requirements you have concerning the data and what kind of data you expect to have. But in overall if the amount of data is not large, I think trying to optimize that part may not be worth the effort.

Another thing is the empty try..catch block. You shouldn't catch exceptions unless you at least inform about them. With the current code you may have exceptions and problems while reading the data but you don't know about it.

Member 12692000 8-Apr-18 5:12am    
Hi, I'm getting duplicate error results of multiple files i.e.
D:\test\xml\ref.xml
---------------------------------------------------
Check id="ref3" for possible missing <source>
Check id="ref5" for possible missing <source>
Check id="ref10" for possible missing <source>
Check id="ref13" for possible missing <source>

D:\test\xml\ref.xml
---------------------------------------------------
Check id="ref3" for possible missing <source>
Check id="ref5" for possible missing <source>
Check id="ref10" for possible missing <source>
Check id="ref13" for possible missing <source>

D:\test\xml\ref2.xml
---------------------------------------------------
Check id="ref9" for possible missing <source>
Check id="ref20" for possible missing <source>

D:\test\xml\ref2.xml
---------------------------------------------------
Check id="ref9" for possible missing <source>
Check id="ref20" for possible missing <source>
Wendelius 8-Apr-18 5:21am    
Sorry, my bad. Forgot to initialize the stringbuilder on each iteration. See the updated code.
Member 12692000 8-Apr-18 5:40am    
Thanks

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900