Click here to Skip to main content
15,887,485 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
See more:
Hello Every one,

I am trying to read 14 GB file, if any of that file line contains word "NULL" i would writing that particular line in separate text file, below is the code what i have tried.

my file looks like this

ID|F_NAME|MIDDLE_NAME|L_NAME
1|PRADEEP|NULL|KUMAR

Here i wanna find NULL. Actual problem is file size its around 14 Gb, i tried using ReadAllText(),Streamreader.ReadLine() both throwin me Memory out exception. Is there a way I can accomplish?

Immediate help appreciated!

Thanks

What I have tried:

C#
using (FileStream fs = File.Open(Sources_path + "\\" + Filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
                    using (BufferedStream bs = new BufferedStream(fs))

C#
using (StreamReader sr = new StreamReader(bs))
                    {
                        
                        sr.ReadToEnd();

}
Posted
Updated 29-Nov-16 8:25am
v2
Comments
PIEBALDconsult 26-Nov-16 16:09pm    
Without more detail I recommend simply using the DOS FIND command rather than writing something.
pradeep kumar 26-Nov-16 16:58pm    
I tried using ReadAllText(),readline() its throwing Memory out exception. To put in simple way below will be the content look like

ID|FIRST_NAME|MIDDLE_NAME|LAST_NAME
1|PRADEEP|NULL|KUMAR

I wanna read line by line nor whole content and find NULL and write that particular line to textpad.
Maciej Los 28-Nov-16 16:19pm    
Have you tried to use ADO.NET (OleDb)?

You have no choice but to read the file one line at a time. You can NOT use ReadAllLines, or anything like it, because it will try to read the ENTIRE FILE into memory in an array of strings. Unless you happen to have about 30GB of ram in the machine, you're not going to be able to read the file.

Also, an array is limited to 2.47-ish billion entries. If you've got more than that in the number of lines int he file, you still can't read it in its entirety all at once.

You MUST read the file, one line at a time, and process each line as you read it. You then don't need to have tons of memory in the machine. You just need enough memory to read a single line of the file.
C#
using (StreamReader sw = new StreamReader("filepath")
{
    string line = sw.ReadLine();
    ... process your line data ...
}
 
Share this answer
 
First solution: Add an insane amount of memory. Remember, the file is likely to grow and you also need space for the resulting file.

Second solution: Read the file line by line.
Quote:
Streamreader.ReadLine() throwing me Memory out exception.
Impossible unless you also try to store the file in memory. Think about it: Do you need to store the whole file in memory ?
1 line contain enough information to tell what to do with it.

Third solution: this file is likely to come from a database. Querying directly the database for NULLs would be more efficient.
 
Share this answer
 
This is very simple!
C#
File.WriteAllLines("path to output file", 
                   File.ReadLines(Path.Combine(Sources_path, Filename))
                   .Where(l => l.Contains("NULL")));

This will process the lines one at a time while they are being read. It doesn't try to have everything in memory all at once.

If you want the comparison to be case-insensitive, then change the last line above to:
C#
.Where(l => l.IndexOf("NULL", StringComparison.OrdinalIgnoreCase) >= 0));
 
Share this answer
 
v3
Comments
pradeep kumar 29-Nov-16 15:23pm    
This looks pretty interesting I will test .! Thanks
You want to read it line by line. Also, I think you may be over-thinking it a bit. Forget about buffering. By the time you see the data it's already been buffered by the HDD on-disk controller, the OS driver, and the runtime, so your buffering worries are over. Use StreamReader.ReadLine() as others have suggested. You will find that for very large quantities of data like that, the file-system is much, much faster. RAM is faster in principle, but in practice if you make the OS go into the paging file, then you won't finish in your lifetime. This is the version where the Tortoise beats the Hare.
 
Share this answer
 
yes ppolymorphe is correct this file comes directly from database, its obvious question why I wanna read instead of removing null from database only reasons are

1. I am fetching columns dynamically for each tables below is my code for better understanding

DECLARE @colnm VARCHAR(MAX)
SET @colnm=''

SELECT @colnm = @colnm + CASE WHEN DATA_TYPE in ('numeric','decimal') 
	THEN + 'ISNULL(' + 'CONVERT(VARCHAR(50),'+'['+ColumnName +']' + ')' +','+' '''''''' '+')' +'as' +'[' + ColumnName + ']' 
	
	ELSE + 'ISNULL(' +'['+ColumnName+']'+ ','+' '''''''' '+')' +'as'+ '[' + ColumnName + ']'  end +','

FROM   #TMP_FINALCOLUMN COMMA
where Comma.TableName = @TBL_NM


SET @CLM_NM =(SELECT  LEFT(@colnm,LEN(@colnm)-1))


SELECT 'xp_cmdshell '''+'sqlcmd -S SERVER -d DB_NM' 
+ ' -E -Q '+'"'+
+'SET NOCOUNT ON; select ' + @CLM_NM + ' from ' 
+QUOTENAME(View_name)+'"'+
+ ' '+'-o'+' '
+ '"A:\DUMMY\'+view_name+'.txt" -W -w 1024 -s"|"'+'''' as Query
FROM TABLE_NAME where STAT=1 AND VIEW_NAME = @TBL_NM


2. Still I found some NULLS when i checked randomly in those files.

Any way thanks your responses, I dropped plan of reading files line by line in this case :)
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900