Read 14 GB file in C#

Question

5.00/5 (1 vote)

See more:

Hello Every one,

I am trying to read 14 GB file, if any of that file line contains word "NULL" i would writing that particular line in separate text file, below is the code what i have tried.

my file looks like this

ID|F_NAME|MIDDLE_NAME|L_NAME
1|PRADEEP|NULL|KUMAR

Here i wanna find NULL. Actual problem is file size its around 14 Gb, i tried using ReadAllText(),Streamreader.ReadLine() both throwin me Memory out exception. Is there a way I can accomplish?

Immediate help appreciated!

Thanks

What I have tried:

C#

using (FileStream fs = File.Open(Sources_path + "\\" + Filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
                    using (BufferedStream bs = new BufferedStream(fs))

C#

using (StreamReader sr = new StreamReader(bs))
                    {
                        
                        sr.ReadToEnd();

}

Posted 26-Nov-16 9:39am

pradeep kumar

Updated 29-Nov-16 8:25am

v2

Add a Solution

Comments

PIEBALDconsult 26-Nov-16 16:09pm

Without more detail I recommend simply using the DOS FIND command rather than writing something.

pradeep kumar 26-Nov-16 16:58pm

Maciej Los 28-Nov-16 16:19pm

Have you tried to use ADO.NET (OleDb)?

5 solutions

Solution 4

This is very simple!

C#

File.WriteAllLines("path to output file", 
                   File.ReadLines(Path.Combine(Sources_path, Filename))
                   .Where(l => l.Contains("NULL")));

This will process the lines one at a time while they are being read. It doesn't try to have everything in memory all at once.

If you want the comparison to be case-insensitive, then change the last line above to:

C#

.Where(l => l.IndexOf("NULL", StringComparison.OrdinalIgnoreCase) >= 0));

Posted 29-Nov-16 7:38am

Matt T Heffron

Updated 29-Nov-16 7:46am

v3

Comments

pradeep kumar 29-Nov-16 15:23pm

This looks pretty interesting I will test .! Thanks

Solution 3

You want to read it line by line. Also, I think you may be over-thinking it a bit. Forget about buffering. By the time you see the data it's already been buffered by the HDD on-disk controller, the OS driver, and the runtime, so your buffering worries are over. Use StreamReader.ReadLine() as others have suggested. You will find that for very large quantities of data like that, the file-system is much, much faster. RAM is faster in principle, but in practice if you make the OS go into the paging file, then you won't finish in your lifetime. This is the version where the Tortoise beats the Hare.

Posted 28-Nov-16 10:10am

YrthWyndAndFyre

Solution 5

yes ppolymorphe is correct this file comes directly from database, its obvious question why I wanna read instead of removing null from database only reasons are

1. I am fetching columns dynamically for each tables below is my code for better understanding

DECLARE @colnm VARCHAR(MAX)
SET @colnm=''

SELECT @colnm = @colnm + CASE WHEN DATA_TYPE in ('numeric','decimal') 
	THEN + 'ISNULL(' + 'CONVERT(VARCHAR(50),'+'['+ColumnName +']' + ')' +','+' '''''''' '+')' +'as' +'[' + ColumnName + ']' 
	
	ELSE + 'ISNULL(' +'['+ColumnName+']'+ ','+' '''''''' '+')' +'as'+ '[' + ColumnName + ']'  end +','

FROM   #TMP_FINALCOLUMN COMMA
where Comma.TableName = @TBL_NM


SET @CLM_NM =(SELECT  LEFT(@colnm,LEN(@colnm)-1))


SELECT 'xp_cmdshell '''+'sqlcmd -S SERVER -d DB_NM' 
+ ' -E -Q '+'"'+
+'SET NOCOUNT ON; select ' + @CLM_NM + ' from ' 
+QUOTENAME(View_name)+'"'+
+ ' '+'-o'+' '
+ '"A:\DUMMY\'+view_name+'.txt" -W -w 1024 -s"|"'+'''' as Query
FROM TABLE_NAME where STAT=1 AND VIEW_NAME = @TBL_NM

2. Still I found some NULLS when i checked randomly in those files.

Any way thanks your responses, I dropped plan of reading files line by line in this case :)

Posted 29-Nov-16 8:25am

pradeep kumar

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Dave Kreskowiak · Accepted Answer · 2016-11-26T11:17:00

You have no choice but to read the file one line at a time. You can NOT use ReadAllLines, or anything like it, because it will try to read the ENTIRE FILE into memory in an array of strings. Unless you happen to have about 30GB of ram in the machine, you're not going to be able to read the file.

Also, an array is limited to 2.47-ish billion entries. If you've got more than that in the number of lines int he file, you still can't read it in its entirety all at once.

You MUST read the file, one line at a time, and process each line as you read it. You then don't need to have tons of memory in the machine. You just need enough memory to read a single line of the file.

C#

using (StreamReader sw = new StreamReader("filepath")
{
    string line = sw.ReadLine();
    ... process your line data ...
}

Patrice T · Accepted Answer · 2016-11-26T19:18:00

First solution: Add an insane amount of memory. Remember, the file is likely to grow and you also need space for the resulting file.

Second solution: Read the file line by line.

Quote:
Streamreader.ReadLine() throwing me Memory out exception.

Impossible unless you also try to store the file in memory. Think about it: Do you need to store the whole file in memory ?
1 line contain enough information to tell what to do with it.

Third solution: this file is likely to come from a database. Querying directly the database for NULLs would be more efficient.