C: problems when parsing a file as a stream of lines

Question

0.00/5 (No votes)

See more:

It's pretty common that I will use fgets() in a loop to iterate through the lines of a file and process them. Normally it works well but in some cases it sucks.

Example 1: If you need to know information about the next line(s) or previous line(s) to decide how to process the current line.

Example 2: The current line indicates you have read too far and you must send it to a different context of your parser.

I know of many workarounds, but overall they tend to be untenable hacks.

I could do read the entire file in as an array of strings, and then crawl through it with relative ease.

I think this is the best solution but it comes with a major downside: memory consumption equivalent to the file size. As far as I know fopen() and fgets() are not doing that.

What are your thoughts on a best compromise? Create a file I/O interface with a small cache?

Thanks

What I have tried:

Storing two or three lines at a time and using references to process them as a set

Using an un-gets-line function to push lines back onto a stack if they have been read and need to be unread, drawing new lines off of the stack until it runs out and then fetching fresh ones.

Posted 13-Dec-18 11:42am

HS_C_Student

Updated 13-Dec-18 21:12pm

Add a Solution

3 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Mohibur Rashid · Answer 1 · 2018-12-13T14:24:00

Solution 1

How about create index?
In this case, you will have to read the file twice, but, you will have complete knowledge of your data set.

You will create indexes by reading each character and seeking new line, when you hit the newline register it in memory, as array or as linked list. Using this you don't even have to cache any string, just hop from index to index.

Posted 13-Dec-18 14:24pm

Mohibur Rashid

Comments

Rick York 14-Dec-18 3:00am

If have used this technique but occasionally it has problems. You have to open the file in text mode to read it and ftell can occasionally have problems in text mode. Most of the time it works but it is very frustrating when it doesn't.

Mohibur Rashid 14-Dec-18 5:23am

Ignore \r count only \n

Rick York 14-Dec-18 18:13pm

If reading each character that can work.

KarstenK · Answer 2 · 2018-12-13T20:30:00

Solution 2

When it is the work of the app to parse the complete file it should be no problem. Test it.

At best you use some standard containers like int this file reading example code.

Posted 13-Dec-18 20:30pm

KarstenK

CPallini · Answer 3 · 2018-12-13T21:12:00

Quote:
Example 1: If you need to know information about the next line(s) or previous line(s) to decide how to process the current line.

Example 2: The current line indicates you have read too far and you must send it to a different context of your parser.

Both of such problems are usually solved in parsers using the look-ahead mechanism (see for instance Parsing - Wikipedia[^] ), that, if I am not wrong, you already find yourself as a possible solution

Quote:
Storing two or three lines at a time and using references to process them as a set