Click here to Skip to main content
15,888,066 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a large number of files that I need to send to an external API for processing. My project is a .net core API that receives a POST to my controller when a file needs to be processed. The information passed into the controller will have the location of the file to download. I need to download this file (some of these files can be 5 gig), then take this file and split it into chunks and post these chunks to outside API for processing. I am having trouble reading the file in chunks. The file will be saved locally so I have access to it, but I am having problems reading the chunks and POSTing them. How do you read large files in chunks in .NET core?

Thanks.

What I have tried:

I have tried using Stream.Read and StreamReader but I keep running to problems with chunking data. For example, out of memory, or creating a chunk with no data.
Posted
Updated 1-Aug-19 6:36am
Comments
Richard Deeming 30-Jul-19 13:48pm    
Perhaps you should look at the System.IO.Pipelines package, which was designed for this sort of thing?
System.IO.Pipelines: High performance IO in .NET | .NET Blog[^]
Katghoti 30-Jul-19 14:13pm    
As of January 2019 there is no official support for reading large files in chunks using this library, and was created for networking use cases. As of version 2.1 there is no support for any file way points.
Richard Deeming 30-Jul-19 14:28pm    
That's a shame.

So you're reading a file from the local computer? That should be fairly easy to do with the Stream class. Where are you stuck?
Katghoti 30-Jul-19 14:52pm    
I am looking at the Stream Class using readblockasync. I'll see where this takes me and may be back for specific questions. Thanks for the help.
Katghoti 30-Jul-19 14:58pm    
Also, I should mention that this is a binary (MP4) file that I am reading. That is where I am getting stuck.

There are a lot of missing information that makes it hard to help you. But here are some answers based on random guesses on what you might need.

If you need to post to an API using the HTTP chunked transport, simply open the source file as a stream, then use StreamContent with an HttpClient.

If the "chunked" refers to the API expecting the data to be uploaded using multiple POST calls, each call transferring a subset of the data, then there are a few options:

If memory usage (and performance) is not a big concern this produce reasonable readable code:

1) Open a file stream
2) Create a BinaryReader for the stream
3) Read chunksize bytes from the BinaryReader.
4) Post the data to the API
5) Go to step 3) if not completed.

If memory usage is still a problem, ensure you allocate a single byte array with the required chunksize and use the BinaryReader.Read(byte[] buffer, int index, int count) method in step 3). You can even do that on the Stream itself (so bypassing the BinaryReader), but then you need to understand that Stream.Read does not always read the number of bytes you expect - forcing you to write an extra loop. You can as well let the BinaryReader do that for you.

If memory/performance use is a concern, then you can implement your own "chunk stream". It is not extremely difficult, but still - there is always room for mistakes :). Maybe a reasonable nuget package is available so you do not need to implement it yourself, I don't know.

Making a custom stream would allow the data to be send through with only small parts placed in memory. It will also ensure reading the file and posting the data happens concurrently further increasing performance.

To implement it:

Make a class called something meaningful (the most difficult part is to have a decent name). FileChunkStream or something should do.

Add a constructor that accepts a Stream (which will be the FileStream you open) and the chunk size).

Add the base class Stream

Implement all the required methods (you can throw a NotSupportedException on Write operations), by calling the base stream you got in the constructor, making the necessary offset calculations for Length, Seek and Position based on the chunk size and index. For the length, remember to deal with the last chunk correctly, it will most likely NOT be the same length as the rest.

You can't have multiple chunks for the same Stream open at the same time with the most basic implementation as it all chunks would use the current Position of the underlying Stream. Two options are available if this is needed: Let each FileChunkStream track it's own position - and seek before any read operation. You would need to protect with a semaphore or something, so if you are not an experienced programmer, I would instead recommend you simply open the same file multiple times (once for each chunk). Open the file for Read only and specify FileShare.Read.

The details I gave above also assumes the stream is in the right position for the next chunk when you open it. You can also add the chunk index to the constructor - then the position can easily be calculated.

I do realize that this is a very high level description, but writing it at this level is already taking a lot of time, and I do not even know if you need it. So if you do need this, and need more details, please ask SPECIFIC questions.
 
Share this answer
 
Turns out I was overthinking this. This code will pull down a 1 gig file in about 50 seconds:

static public class FileDownloadAsync
    {
        static public async Task DownloadFile(string filename)
        {
            //File name is 1GB.zip for testing
            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();
            using (HttpClient client = new HttpClient())
            {
                string url = @"http://speedtest.tele2.net/" + filename;
                using (HttpResponseMessage response = await client.GetAsync(url, HttpCompletionOption.ResponseHeadersRead))
                using (Stream readFrom = await response.Content.ReadAsStreamAsync())
                {
                    string tempFile = $"D:\\Test\\{filename}";
                    using (Stream writeTo = File.Open(tempFile, FileMode.Create))
                    {
                        await readFrom.CopyToAsync(writeTo);
                    }
                }
                stopwatch.Stop();
                Debug.Print(stopwatch.Elapsed.ToString());
            }
            

        }
    }
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900