Click here to Skip to main content
15,888,286 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
Dear all,

I asked a question a few weeks ago about usage of memory as I feared I had a leak. Turns out after some profiling, I didn't have a problem other than I was using much more actual system memory than I thought I could possibly be - and as a result I quickly realised that the app I'm writing needed a change of direction.

The application is a datalogger capable of displaying lots of channels of data on a screen in a manner similar to an oscilloscope. I have determined that the correct course of action to reduce the memory usage to a minimum is to keep in system memory only points of data that are displayed on screen. So now, when the logger is asked to record structures of the following form are saved to disk:

C#
[StructLayout(LayoutKind.Explicit, Size=14)]
public struct LogPoints
{
    [FieldOffset(0)]public float xtime;
    [FieldOffset(4)]public byte BusId;
    [FieldOffset(5)]public UInt32 Ident;
    [FieldOffset(9)]public byte ItemNo;
    [FieldOffset(10)]public float yval;
}


BusID, Ident and ItemNo allow me to determine which channel of data this point belongs to, xtime and yval are, of course, the time and data value.

The data is not sorted by channel, but it is written in chronological order to the disk. On average data is written to the file at a constant speed - i.e. in any given second, the number of points written to the file will be roughly constant.

As an example, a 10 minute log spooling data to the file could exceed 100MB in size.

The problem has come when I've tried to get data from this file quickly enough to display it on screen without taking too long.

To display the data, the user can create a graphic display. For each display, the user can select whichever channels of data he/she wants to display - you might choose to select the same channel to be displayed in 2 places, and you might also want to display a different time range in any of your graphs.

Accordingly, I have tried several ways of allowing each instance of the 'Graphical Display' class to have access to the file containing the points of data. After lots of trial and error, reading around this subject and looking at lots of articles here (some excellent work by Anthony Baraff and Jun Du), I thought memory-mapped files might be a suitable way to proceed.

So as not to block the GUI, I've made a background worker that is supposed to create a viewstream of an mmf of a fixed size equating to 100,000 data point structures. I've created a function employing a while loop to keep scrolling through the view, and renewing it as necessary. Passed as arguments to this function are the start and end time of the graph to be displayed, and the timestep. The timestep is related to the physical size of the graph on the screen - e.g. if its 510 pixels wide, I'll try and display only 510 points - so effectively it is ((endtime - startime) / x_graphsize) .... enough words.

UPDATE:

Thanks to hints from SA Kryukov, barneyman, and Roman Lerman - I've effectively rolled my own variable sized memory buffer:

C#
while (_filepos < _endpos)
{
                
    //_filepos is stream position of file on disk, _endpos is the stream position on disk of the
    //last point that we want to plot
    //_increment is the sizeof() one datapoint structure
    //_chunksize is the number of bytes to read this time
    //make sure it doesn't try to read past _endpos

    using (var fstr = new FileStream(fname, FileMode.Open, FileAccess.Read, FileShare.Read, (int)_chunksize, FileOptions.SequentialScan))
    {
        //Put the file pointer in the right place
        fstr.Seek(_filepos, SeekOrigin.Begin);

        //Create a temporary memory buffer of fixed size
        byte[] buf = new byte[(int)_chunksize];

        //Get the specified number of bytes from the file
        fstr.Read(buf, 0, buf.Length);

        //Create a stream from the memory buffer, and a reader for the stream
        using (var mstr = new MemoryStream(buf))
        using (var mr = new BinaryReader(mstr))
        {
            //Get last time value - put it one point from the end
            mstr.Seek((-_increment), SeekOrigin.End);
            _bufendtime = mr.ReadSingle();

            for (cnt = 0; cnt < NumList.Count; cnt++)
            {
                //Set the memory stream position back to the start
                mstr.Seek(0, SeekOrigin.Begin);

                //Initialise the first time value
                timeval = mr.ReadSingle();

                //Put it back at the start
                mstr.Seek(-4, SeekOrigin.Current);

                _chunkpos = 0;
                while (_chunkpos <= (_chunksize - _increment))
                {
                  //Read the data from the point at the stream position
                  _xtime = mr.ReadSingle();
                  _busid = mr.ReadByte();
                  _id = mr.ReadUInt32();
                  _chan = mr.ReadByte();
                  _ydata = mr.ReadSingle();
                  //Check if this point is for this channel
                  if ((_xtime >= timeval) && (_busid == (NumList[cnt].varCANin - 1)) && (_id == (uint)NumList[cnt].ItemNo) && (_chan == NumList[cnt].ChanID))
                  {
                      //point must go into the list
                      NumList[cnt].Points.Add(new GraphPoints
                      {
                          xdata = _xtime,
                          ydata = _ydata
                      });

                      //Finding next point - move the file position
                      timeval += timestep;
                      //short loop to get to next point quickly based on value of timestep
                      found = false;
                      while ((!found) && (timeval < _bufendtime))
                      {
                          _xtime = mr.ReadSingle();
                          if ((_xtime >= timeval) || (mstr.Position > (_chunkpos - _increment)))
                          {
                              found = true;
                          }
                          else
                          {
                              mstr.Seek((_increment - 4), SeekOrigin.Current);
                              _chunkpos = mstr.Position;
                          }
                      }
                      if ((timeval < _bufendtime) && (mstr.Position < (_chunksize - _increment)))
                      {
                          //Put it back at the start of this point
                          mstr.Seek(-4, SeekOrigin.Current);
                      }
                      else if (timeval > _bufendtime)
                      {
                          //forces next chunk to be loaded if there is one
                          _chunkpos = _chunksize;
                      }
                  }
                  else
                  {
                      //reader position has moved - _chunkpos must now be incremented
                      _chunkpos += _increment;
                  }
              }
          }
          //Got all points from this chunk of file, increment the real file pointer
          _filepos += _chunksize;
    }
}


Note the use of the 'internal' while loop to find the next datapoint more quickly once you've found the first. In the vast majority of cases the number of datapoints you've got far exceeds the number of pixels you've got to plot them - so the value of timestep allows you to skip a large number of points in the file.

This achieves about 0.5s to extract 2 datachannels from a 10 minute log, but to show more channels it does get quite slow.

Does anyone have any further ideas - should I try sorting the raw file next?

Thanks in advance for any replies.

Kind regards,

Aero
Posted
Updated 10-Aug-12 3:13am
v2
Comments
Sergey Alexandrovich Kryukov 6-Aug-12 20:09pm    
First of all, I would question if you really need to get data from a file. You could consider keeping the data in the file and get it when required by small piece each time you needed. File bufferization can transparently help you, especially if it is likely that your requests to the file are close with good probability.
--SA
Aero72 7-Aug-12 4:22am    
Thanks SA, As you suggest I should be able to seek somewhere near because of the chronological order of the file. I'll read a sizeable portion of the file at a time and report back - I'll take a look at BufferedFileReader
barneyman 6-Aug-12 21:44pm    
Agree with SA - Memmap is redundant - it's reading the whole thing into memory for you to seek to an offset and read it ... do that directly on the disk file, read in large blocks and do your own chunking
Aero72 7-Aug-12 4:17am    
Thanks, Barneyman - I hoped that's what the MMF was doing for me - reserving a fixed size of memory and have the OS read chunks of the file into that space. I'll try to roll my own and report back.
Roman Lerman 7-Aug-12 10:03am    
What I'm using is modified version of this approach FileByteArray[^] My file can be accessed and modified on the fly, the length of "FileArray" is dynamic and the read speed is enough for reading fast large amount of data (~50 MB/sec, depends on op. system). I hope it will help you. Sorry for my English.

1 solution

Sounds like you are dealing with a lot of data, so here is a underutilized Windows feature:
ManagedEsent[^]. I've used this functionality in a number of projects, and so far the performance has been excellent. Most of my projects uses the API directly from C++, but I've also used ManagedEsent, which is very easy to use, in a few projects.

The ManagedEsent developer reports the following performance:

Sequential inserts : 32,000 entries/second
Random inserts : 17,000 entries/second
Random Updates : 36,000 entries/second
Random lookups : 137,000 entries/second
Linq queries : 14,000 queries/second


Best regards
Espen Harlinn
 
Share this answer
 
Comments
Christoph Keller 10-Aug-12 10:05am    
Thanks for the ManagedEsent hint! Never heard about it and sounds really interesting!

my 5* :)

Thanks again and happy coding,
Chris
Espen Harlinn 10-Aug-12 10:14am    
Thanks Chris :-D
Aero72 11-Aug-12 6:24am    
That looks good.....I have some reading to do! Many Thanks Espen, I'll report back when I've tried it. My 5.
Espen Harlinn 11-Aug-12 6:27am    
Glad you liked it, MS Exchange is implmented on top of this technology - so is, to my knowledge, Active Directory.
Aero72 15-Aug-12 11:23am    
Thanks for trying Espen - I'm sure getting data out of the tables would be fine, but unfortunately it looks like adding records to the database in the first place kills the app, even if one transaction consists of (many) multiple inserts before committing the data.....I've also tried it using embedded firebird and the .NET provider - and its faster at writing to the db than Esent, but still not quick enough :-(

Thanks again,

Aero

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900