Click here to Skip to main content
15,880,543 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi I want to read a bytes from large file >2GB
When i do that OutofMemory exception is thrown, cause i read the whole file to a memory, all I know is that I can chunk the file into small pieces...
So what is the best code to do that?

Reason for reading the file, is to find some bytes that stored in the file.

Any suggestion will be really appreciated.
Posted

In addition to the correct answer by Espen Harlinn:

Breaking a file into chunks will hardly help you, unless those chunks are of different natures (different formats, representing different data structures), so they were put in one file without proper justification.

In other cases, it's good to use the big file and keep it open. There are cases when you need to split the file in two pieces. This is just the basic idea; see below.

So, I would assume that the file is big just because it represent a collection of object of the same type or few different types. If all the items are of the same size (in file storage units), addressing then is trivial: you simply need to multiply the size by required index of the item, to get a position parameter for Stream.Seek. So, the only non-trivial case is when you have a collection of items of different size. If this is the case, you should index the file and build the index table. The index table will consist of the units of the same size, which is typically the list/array of file positions per index. Due to this fact, addressing to the index table can be done by index (shift), as described above, and then you read position of the "big" file, move file position there and read data.

You will have 2 options: 1) keep index table in memory; you can recalculate it each time; but it's better to do it once (cache) and to keep it in some file, the same or a separate one; 2) to have it in a file and read this file at required position. This way, you will have to seek the position in the file(s) in two steps. In principle, this method will allow you to access files of any size (limited only by System.Uint64.MaximumValue).

After you position in a stream of a "big" file, you can read a single item. You can use serialization for this purpose. Please see:
http://en.wikipedia.org/wiki/Serialization#.NET_Framework[^],
http://msdn.microsoft.com/en-us/library/vstudio/ms233843.aspx[^],
http://msdn.microsoft.com/en-us/library/system.runtime.serialization.formatters.binary.binaryformatter.aspx[^].

A fancy way of implementing all the solutions with index table would be encapsulating it all in the class with indexed property.

—SA
 
Share this answer
 
Comments
Espen Harlinn 11-Feb-13 4:26am    
Good points :-D
Sergey Alexandrovich Kryukov 11-Feb-13 4:41am    
Thank you, Espen.
—SA
Have a look at:
FileStream.Read[^]
FileStream.Seek[^]

That pretty much covers what you need to know.

[Update]
Your implementation should look a bit like this:
const int megabyte = 1024 * 1024;

public void ReadAndProcessLargeFile(string theFilename, long whereToStartReading = 0)
{
    FileStream fileStram = new FileStream(theFilename,FileMode.Open,FileAccess.Read);
    using (fileStram)
    {
        byte[] buffer = new byte[megabyte];
        fileStram.Seek(whereToStartReading, SeekOrigin.Begin);
        int bytesRead = fileStram.Read(buffer, 0, megabyte);
        while(bytesRead > 0)
        {
            ProcessChunk(buffer, bytesRead);
            bytesRead = fileStram.Read(buffer, 0, megabyte);
        }

    }
}

private void ProcessChunk(byte[] buffer, int bytesRead)
{
    // Do the processing here
}


Best regards
Espen Harlinn
 
Share this answer
 
v4
Comments
Sergey Alexandrovich Kryukov 10-Feb-13 18:14pm    
Basically, this is all one needs, my 5, but proper design of well optimized and encapsulated code needs considerable experience or brain work.
I provided some basic directions, please see.
—SA
Espen Harlinn 11-Feb-13 4:25am    
Thank you, Sergey :-D
Leecherman 11-Feb-13 0:33am    
Thank you two for your replies, the first solution I know it and it didn't helped me, it works with small size of files but not the big one, the second one is not what I want, all what I want is to read a bytes from a large file.
Any posted code will be really appreciated...
thanks
Espen Harlinn 11-Feb-13 4:25am    
The Read method works well enough for me, even when the file is > 2GB.
You just read a chunk into memory, process it, and reuse the byte array for the next chunk.
Leecherman 12-Feb-13 20:50pm    
thanks again for your replies, please can you post the code?
tried to chunk it but with no luck at all...

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900