Read a large file from disk in chunks to send to an API .NET core

Question

0.00/5 (No votes)

See more:

I have a large number of files that I need to send to an external API for processing. My project is a .net core API that receives a POST to my controller when a file needs to be processed. The information passed into the controller will have the location of the file to download. I need to download this file (some of these files can be 5 gig), then take this file and split it into chunks and post these chunks to outside API for processing. I am having trouble reading the file in chunks. The file will be saved locally so I have access to it, but I am having problems reading the chunks and POSTing them. How do you read large files in chunks in .NET core?

Thanks.

What I have tried:

I have tried using Stream.Read and StreamReader but I keep running to problems with chunking data. For example, out of memory, or creating a chunk with no data.

Posted 30-Jul-19 7:43am

Katghoti

Updated 1-Aug-19 6:36am

Add a Solution

Comments

Richard Deeming 30-Jul-19 13:48pm

Perhaps you should look at the System.IO.Pipelines package, which was designed for this sort of thing?
System.IO.Pipelines: High performance IO in .NET | .NET Blog[^]

Katghoti 30-Jul-19 14:13pm

As of January 2019 there is no official support for reading large files in chunks using this library, and was created for networking use cases. As of version 2.1 there is no support for any file way points.

Richard Deeming 30-Jul-19 14:28pm

That's a shame.

So you're reading a file from the local computer? That should be fairly easy to do with the Stream class. Where are you stuck?

Katghoti 30-Jul-19 14:52pm

I am looking at the Stream Class using readblockasync. I'll see where this takes me and may be back for specific questions. Thanks for the help.

Katghoti 30-Jul-19 14:58pm

Also, I should mention that this is a binary (MP4) file that I am reading. That is where I am getting stuck.

Richard MacCutchan 31-Jul-19 3:42am

All files are binary (streams of bytes). It is only when their content is interpreted by an application that the binary becomes useful information.

2 solutions

Solution 2

Turns out I was overthinking this. This code will pull down a 1 gig file in about 50 seconds:

static public class FileDownloadAsync
    {
        static public async Task DownloadFile(string filename)
        {
            //File name is 1GB.zip for testing
            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();
            using (HttpClient client = new HttpClient())
            {
                string url = @"http://speedtest.tele2.net/" + filename;
                using (HttpResponseMessage response = await client.GetAsync(url, HttpCompletionOption.ResponseHeadersRead))
                using (Stream readFrom = await response.Content.ReadAsStreamAsync())
                {
                    string tempFile = $"D:\\Test\\{filename}";
                    using (Stream writeTo = File.Open(tempFile, FileMode.Create))
                    {
                        await readFrom.CopyToAsync(writeTo);
                    }
                }
                stopwatch.Stop();
                Debug.Print(stopwatch.Elapsed.ToString());
            }
            

        }
    }

Posted 1-Aug-19 6:36am

Katghoti

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

lmoelleb · Accepted Answer · 2019-07-30T21:38:00

There are a lot of missing information that makes it hard to help you. But here are some answers based on random guesses on what you might need.

If you need to post to an API using the HTTP chunked transport, simply open the source file as a stream, then use StreamContent with an HttpClient.

If the "chunked" refers to the API expecting the data to be uploaded using multiple POST calls, each call transferring a subset of the data, then there are a few options:

If memory usage (and performance) is not a big concern this produce reasonable readable code:

1) Open a file stream
2) Create a BinaryReader for the stream
3) Read chunksize bytes from the BinaryReader.
4) Post the data to the API
5) Go to step 3) if not completed.

If memory usage is still a problem, ensure you allocate a single byte array with the required chunksize and use the BinaryReader.Read(byte[] buffer, int index, int count) method in step 3). You can even do that on the Stream itself (so bypassing the BinaryReader), but then you need to understand that Stream.Read does not always read the number of bytes you expect - forcing you to write an extra loop. You can as well let the BinaryReader do that for you.

If memory/performance use is a concern, then you can implement your own "chunk stream". It is not extremely difficult, but still - there is always room for mistakes :). Maybe a reasonable nuget package is available so you do not need to implement it yourself, I don't know.

Making a custom stream would allow the data to be send through with only small parts placed in memory. It will also ensure reading the file and posting the data happens concurrently further increasing performance.

To implement it:

Make a class called something meaningful (the most difficult part is to have a decent name). FileChunkStream or something should do.

Add a constructor that accepts a Stream (which will be the FileStream you open) and the chunk size).

Add the base class Stream

Implement all the required methods (you can throw a NotSupportedException on Write operations), by calling the base stream you got in the constructor, making the necessary offset calculations for Length, Seek and Position based on the chunk size and index. For the length, remember to deal with the last chunk correctly, it will most likely NOT be the same length as the rest.

You can't have multiple chunks for the same Stream open at the same time with the most basic implementation as it all chunks would use the current Position of the underlying Stream. Two options are available if this is needed: Let each FileChunkStream track it's own position - and seek before any read operation. You would need to protect with a semaphore or something, so if you are not an experienced programmer, I would instead recommend you simply open the same file multiple times (once for each chunk). Open the file for Read only and specify FileShare.Read.

The details I gave above also assumes the stream is in the right position for the next chunk when you open it. You can also add the chunk index to the constructor - then the position can easily be calculated.

I do realize that this is a very high level description, but writing it at this level is already taking a lot of time, and I do not even know if you need it. So if you do need this, and need more details, please ask SPECIFIC questions.