Click here to Skip to main content
15,888,330 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi,
In our system we have multiple services, each service have a log folder.
The logs are very large, could end with 40MB-2GB.
Each row in the log starts with a date format, like this: 21.07.2020-16.40.22

We wrote an app that helps us search between all folders all the relevant rows
given a specific date and time.
Lets say I want to extract all logs from october 12, 2020 between 10:00-13:00,
The system will first filter all logs in all folders according to creation time and last write time.
Then, for each log filtered, we look for the relevnt lines according to the time frame.
The line date is verified using the following method:

C#
private bool IsLineInTimeFrame(long lineNumber, string filePath)
        {
            bool result = false;
            List<string> content = ReadSpecificLine(filePath, lineNumber).Split(' ').ToList();
            foreach (string piece in content)
            {
                //Use the Parse() method
                try
                {
                    var regex = new Regex(@"\b\d{2}\.\d{2}.\d{4}\b-[012]{0,1}[0-9].[0-6][0-9].[0-6][0-9]");
                    foreach (Match m in regex.Matches(piece))
                    {
                        DateTime dt;
                        if (DateTime.TryParseExact(m.Value, "dd.MM.yyyy-HH.mm.ss", null, DateTimeStyles.None, out dt))
                        {
                            Logger.LogWriter.LogInstance.LogWrite($"{m.Value}");
                            if (dt >= Convert.ToDateTime(SelectedDateFrom) && dt <= Convert.ToDateTime(SelectedDateTo))
                            {
                                return true;
                            }
                        }
                    }
                }
                catch (Exception ex){}
            }
            return result;
        }

        string ReadSpecificLine(string filePath, long lineNumber)
        {
            string content = null;
            try
            {
                using (StreamReader file = new StreamReader(filePath))
                {
                    for (int i = 1; i < lineNumber; i++)
                    {
                        file.ReadLine();

                        if (file.EndOfStream)
                        {
                            Logger.LogWriter.LogInstance.LogWarning($"End of file. The file only contains {i} lines.");
                            break;
                        }
                    }
                    content = file.ReadLine();
                }
            }
            catch (IOException ex)
            {
                Logger.LogWriter.LogInstance.LogError(ex);
            }
            return content;
        }

Noe here is the question:
Assuming we have 150K (or even more) lines, what is the best way to search for the first line starting with a given date and time without iterating througout the whole file?

What I have tried:

...................................................................
Posted
Updated 9-Nov-20 21:17pm
v2
Comments
BillWoodruff 10-Nov-20 2:53am    
do you have control over how the logs are structured ?

do the logs have a linear time sequence from start to end ? if so, then a quicksort search where you start in the middle and then recursively parse the middles until you converge on the middle that contains the target ... will work.

without knowing the log structure, and seeing an example of a log entry ... can't say more.

think about the issue of validation of each line/record ... errors will happen.
PIEBALDconsult 10-Nov-20 8:29am    
1) Use a proper date format YYYY-MM-DD (ISO 8601).
2) If searching is required, then you need to build that into the logging mechanism, not add it as an afterthought.
3) Have one set of logs per day. Include the date in the log file name.
4) If you really need to, when archiving logs from previous days, you could add some form of indexing for the archive.
5) In general, do as much work up front, to reduce the amount of work required later.

6) Consider logging into SPLUNK rather than files. Or load the files into SPLUNK for archiving.

1 solution

Assuming the logs are ordered by time (as they should be) then you may perform a binary search[^].
 
Share this answer
 
Comments
BillWoodruff 10-Nov-20 7:14am    
#1 "should be" ... why do you think I asked the OP about this ? Reference to language agnostic sort Wikipedia ... one reference to .NET is at the very end of this very long article.

The OP will be a better person for knowing "The idea of sorting a list of items to allow for faster searching dates back to antiquity. The earliest known example was the Inakibit-Anu tablet from Babylon dating back to c. 200 BCE. The tablet contained about 500 Sexagesimal numbers and their reciprocals sorted in Lexicographical order, which made searching for a specific entry easier."
CPallini 10-Nov-20 7:55am    
Logs usually are.
I don't care a bit why you asked that.
By the way, thank you.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900