Click here to Skip to main content
15,881,882 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more: , +
I have a very large XML file of 6 GB size i want to read it and write its output in multiple files .
I am currently using XmlTextReader and its taking 30 mins .
Please suggest some ways to reduce the processing time .
Posted
Comments
Maciej Los 3-Sep-15 14:45pm    
Based on what condition do you want to parse xml document into few files?

The tags of your questions suggest correct approach: use XmlReader. It presents no limitations to the memory resources consumed, because it does not parse anything to memory. I think there is no a way to significantly increase the speed of processing compared to this approach.

At the same time, I don't know what else you do during parsing, except the use of the reader class. You always have to do something else, and this "something else" can be slow. Apparently, I have no access to your hard drive to tell you where the bottleneck is. All I can advise is: use some performance profiler to research the problem. Please see:
https://msdn.microsoft.com/en-US/library/ms182372.aspx[^],
https://msdn.microsoft.com/en-us/library/z9z62c29.aspx[^].

—SA
 
Share this answer
 
Comments
usernetwork 3-Sep-15 13:53pm    
I know XMLReader is correct approach for large XML files .
I cant use XMLDocumnet since i cant load whole file in memory .
I am thinking to split 1 large single XML file into many small files may be 5-6 and then use threading to parse them simultaneously .
But i am sure how to effectively split a large XML file
Sergey Alexandrovich Kryukov 3-Sep-15 14:07pm    
You are right, but the splitting can be more of a problem then solution. First of all, splitting will add some overhead, for both extra XML data and the overhead of threads (or, perhaps Tasks, TPL, will may serve you better). If you split it to more parts then the number of CPU cores you use, your overhead will take over the benefits. Besides, splitting can compromise the usual XML data integrity. However, you can try it.
I would start with profiling your execution.
—SA
this is my code can i improve it .

C#
namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            ReadEJFile("E:\\a.txt");
        }

        private static void ReadEJFile(string filename)
        {
            Stopwatch sw = Stopwatch.StartNew();
            Console.WriteLine(sw.Elapsed.TotalMilliseconds);
            XmlTextReader reader = new XmlTextReader(filename);
            reader.WhitespaceHandling = WhitespaceHandling.None;
            XDocument doc;

            reader.ReadToFollowing("Root");

            while (reader.Read())
            {
                switch (reader.NodeType)
                {
                    case XmlNodeType.Element:
                        if (reader.Name.Equals("journalEntry"))
                        {
                            doc = XDocument.Load(reader.ReadSubtree());
                            reader.ReadToDescendant("journalEntry");
                            var terminalId = doc.Descendants().Where(n => n.Name == "terminalIdentifier").SingleOrDefault().Value;
                            Stream xmlFile = new FileStream("E:\\Output\\" + terminalId + ".txt",FileMode.Append,FileAccess.Write);
                            XmlTextWriter textWriter = new XmlTextWriter(xmlFile, Encoding.Default);
                            doc.Save(textWriter);
                            textWriter.Close();

                        }
                       break;
                    
                }
                Console.WriteLine(sw.Elapsed.TotalMilliseconds);
                Console.WriteLine("inside");
            }          
        }
    }
}
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900