Click here to Skip to main content
15,879,535 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a fairly large CSV dataset, around 13.5MB and with approximately 120,000 rows and 13 columns. The code below the "What I Have Tried" section is the current solution that I have in place.

Luckily, as I am running this via a Unity coroutine, the program doesn't freeze up, but this current solution takes 31 minutes and 44 seconds to read the entirety of the CSV file.

Is there any other way I can do this? I am trying to target a parse time of less than 1 minute.

What I have tried:

C#
private IEnumerator readDataset()
{
    starsRead = 0;
    var totalLines = File.ReadLines(path).Count();
    totalStars = totalLines - 1;

    string firstLine = File.ReadLines(path).First();
    int columnCount = firstLine.Count(f => f == ',');

    string[,] datasetTable = new string[totalStars, columnCount];

    int lineLength;
    char bufferChar;
    var bufferString = new StringBuilder();
    int column;
    int row;

    using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
    using (BufferedStream bs = new BufferedStream(fs))
    using (StreamReader sr = new StreamReader(bs))
    {
        string line = sr.ReadLine();
        while ((line = sr.ReadLine()) != null)
        {
            row = 0;
            column = 0;
            lineLength = line.Length;
            for (int i = 0; i < lineLength; i++)
            {
                bufferChar = line[i];
                if (bufferChar == ',')
                {
                    datasetTable[row, column] = bufferString.ToString();
                    column++;
                }
                else
                {
                    bufferString.Append(bufferChar);
                }
            }
            row++;
            starsRead++;
            yield return null;
        }
    }
}
Posted
Updated 27-May-20 7:03am
Comments
F-ES Sitecore 27-May-20 12:27pm    
One issue is that you're reading the file twice. If you're doing ReadLines().Count then you are parsing the entire file. You might as well just call ReadLines and store that in an array or List and parse it line by line.

I wrote an article for CSV importing that you may find useful - CSV/Excel File Parser - A Revisit[^]
 
Share this answer
 
Comments
Sidharth Shanmugam 27-May-20 11:31am    
Great article, but is it possible to not use a package. Unity doesn't play well with NuGet packages
#realJSOP 27-May-20 13:36pm    
You can strip out the excel support, and therefore, not use a package.
Quote:
this current solution takes 31 minutes and 44 seconds to read the entirety of the CSV file.

As far as I understand, the problem is not reading the csv, the problem is what you do with it.
Show a sample CVS with 2 rows and what should go in datasetTable.
With a csv like:
title1,t2,t3,t4
a,b,c,d
e,f,g,h

I expect your code to fill datasetTable like:
C#
datasetTable[0,0]="a";
datasetTable[0,1]="ab";
datasetTable[0,2]="abc";
datasetTable[1,0]="abcde";
datasetTable[1,1]="abcdef";
datasetTable[1,2]="abcdefg";

Which is probably not what you want.

Your code do not behave the way you expect, or you don't understand why !

There is an almost universal solution: Run your code on debugger step by step, inspect variables.
The debugger is here to show you what your code is doing and your task is to compare with what it should do.
There is no magic in the debugger, it don't know what your code is supposed to do, it don't find bugs, it just help you to by showing you what is going on. When the code don't do what is expected, you are close to a bug.
To see what your code is doing: Just set a breakpoint and see your code performing, the debugger allow you to execute lines 1 by 1 and to inspect variables as it execute.

Debugger - Wikipedia, the free encyclopedia[^]

Mastering Debugging in Visual Studio 2010 - A Beginner's Guide[^]
Basic Debugging with Visual Studio 2010 - YouTube[^]

Debugging C# Code in Visual Studio - YouTube[^]

The debugger is here to only show you what your code is doing and your task is to compare with what it should do.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900