Click here to Skip to main content
15,905,913 members
Articles / Programming Languages / C#

SlimList

Rate me:
Please Sign up or sign in to vote.
4.32/5 (25 votes)
17 Oct 2009CPOL7 min read 88.8K   245   26   65
SlimList is a C# implemention of IList that uses less memory than List.

tester form

debugger

Disclaimer

Before I start, let me just say this article is not for everyone. If you are looking for a list to use in a production application, move along to a more useful article. If, however, you are interested in data structures, please read on. This article is targeted toward those who would like to deepen their understanding of data structures and algorithms.

Also, please do not tell me that the tester form is coded sloppy (that is unimportant) or that optimizations could be made to specific SlimList functions (I am aware of those too). The purpose of this article is to present a new data structure, not to optimize it to the extreme or to build a full application. That being said, please feel free to make comments about specific optimizations I could make, and I'll happily add them to the bottom of this article so that others may benefit from your idea.

Introduction

SlimList is my attempt to improve upon the .NET List by reducing the amount of memory it consumes. At worst, List consumes 3x the amount of memory actually stored in the List. SlimList reduces this maximum amount to only 2x the amount of memory actually stored in the list. Since SlimList implements IList, it has most of the features that List does. That being the case, you can use SlimList pretty much exactly how you would a List.

List Uses 3x Memory

With the List data structure, an array of elements is stored to hold all the elements. When the capacity of the List is exceeded, a second array that is twice the size of the first one is created. The old array is then copied to the new array and the old array is discarded. So, the array doubles each time the capacity is exceeded. The Reflector screenshots below confirm this is how List performs size increases (EnsureCapacity doubles the length and Capacity performs the copy operation). One might think that only half the memory is being wasted on capacity (i.e., that 2x of the memory is being used). While this is true after the copy operation, it is not true during the copy operation, because both the old array and the new array exist during that time period. Since the new array is twice the size of the old array, a total of 3x the amount of required memory exists during the copy operation. SlimList remedies this.

EnsureCapacity from Reflector

Capacity from Reflector

SlimList Uses 2x Memory

SlimList uses a different structure than List to store elements. Rather than an array, SlimList uses an array of arrays to store elements (sample structure shown in image below). The first array (which I will refer to as the "major" array) is about 32 items long, and each sub-array (which I will refer to as the "minor" arrays) is initialized to null. After enough elements have been added, the first two elements in the major array are each arrays of size 2 (the first and second columns in the image below). The third is of size 4 (third column below), and the fourth is of size 8 (fourth column below). Each subsequent array is twice the size of the previous. The screenshot above and to the right shows this data structure (as viewed from the Visual Studio debugger). These arrays of exponentially increasing size are the ones that hold the actual data elements. List creates a new array twice the size of the existing array when capacity is exceeded. A SlimList, however, creates a new array the same size of the existing elements when capacity is exceeded. Also, no time is wasted copying elements from the old array to the new array, because each new array only stores new values. That is, the existing values stay in their existing arrays, so no copy operation is performed.

16 elements, 4 minor arrays

Adding to a SlimList

When a SlimList is created, an array is created, like the one shown below:

C#
private T[][] items = new T[31][];

The T is a templated type. So, if you were to specify the type as int, the array would look something like this:

C#
private int[][] items = new int[31][];

That is a jagged array. The "major" array has 31 "minor" arrays. Each minor array can be a different size. Initially, the minor arrays are all null. When the first item is added to the SlimList, the first minor array is initialized to size 2. This would look something like this (also shown as the first column in the above image):

C#
items[0] = new int[2];

When the second item is added, it is just assigned as the second element in that minor array. When the third item is added, a second minor array needs to be created, which looks like this:

C#
items[1] = new int[2];

That second minor array happens to be the same size as the first, but all subsequent minor arrays are twice as large as the previous one. So, to manually assign the third element in the SlimList, the code would look something like this:

C#
items[1][0] = value;

Whenever the last element of the last minor array gets filled, the next minor array is created. Each minor array gets created with a size equal to the number of elements already in the SlimList. This means the SlimList doubles each time the capacity is exceeded. The key advantage it has over List is that no copy operation must be performed when capacity increases, so less memory is consumed.

SlimList Indexing

While SlimList uses less memory than List, it also uses more processor power, but indexing is still an O(1) operation. In order to access an item at a given index, two indexing operations must be performed. One to index the major array, and one to index the minor array. On top of that, the index of the major array first needs to be calculated. This is done with a base 2 logarithm (since the minor arrays double in size). That calculation looks like the following:

C#
int firstIndex = (index == 0 ? 0 : (int)Math.Truncate(Math.Log(index, 2)));

The second index is calculated as the offset into the minor array.

Qualitative Comparison of SlimList and List

It is not really worthwhile for me to do a quantitative analysis of the SlimList performance, as this article is purely theoretical. However, here are a few qualitative observations about this data structure:

  • List uses a peak of 3x the required memory. SlimList uses only 2x.
  • SlimList avoids the copy operation that List performs, which means it will save time in this area.
  • SlimList requires more calculations to index an item, so List is faster in this area.
  • SlimList is more complicated than List to implement, and it is harder to describe.
  • All List and SlimList operations have the same big-O notation, although the factor might be different. For example, List and SlimList both have an index performance of O(1), but SlimList would probably be more like 5 * O(1).

Possible Optimizations

These are optimizations which could be done on SlimList, but that I didn't bother to implement, as they were not required to get SlimList up and running:

  • SlimList currently uses Math.Log to calculate the base 2 log. However, there is an assembly instruction called BSR (bit scan reverse) which might be used to do this calculation much faster. By counting the number of binary zeroes to the left of a number, you can figure out the base 2 log. Problem is, the instruction is not on all processors, so I decided not to use it. I'm no assembly expert, but the calculation would be performed something like this:
  • ASM
    MOV EAX, i // Load i into register.
    BSR EAX, EAX // Calculate first bit set.
    MOV i, EAX // Set i to result.
  • Many of the functions use clear, but slow code. For example, to search through the list, I could have iterated through each minor array in succession. However, I decided just to accept the penalty of performing the index calculations for each item because the code was cleaner (and faster to write).
  • SlimList has a fairly large overhead (an array of about 32 elements). This could be fixed with a hybrid of List and SlimList, but that is an exercise for the reader.

Conclusion

SlimList uses less memory than List, but it is also slower than List. If memory is very tight or copy operations are expensive (such as on a hard drive), a SlimList might be the way to go. Download the code and give it a try if you want. The test application will appear very slow, but that is just because of my poor use of threading. That should not be used as an indicator of actual SlimList performance. The only purpose of the tester form is to verify that the SlimList works as advertised. If you have any suggestions or ideas, feel free to leave them below. If you vote low, please give me the courtesy of explaining why you did so and how I might improve this article for a higher vote.

History

  • October 17th, 2009 - Article creation.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
United States United States

  • Managing Your JavaScript Library in ASP.NET (if you work with ASP.net and you don't read that, you are dead to me).
  • Graduated summa cum laude with a BS in Computer Science.
  • Wrote some articles and some tips.
  • DDR ("New high score? What does that mean? Did I break it?"), ping pong, and volleyball enthusiast.
  • Software I have donated to (you should too):

Comments and Discussions

 
GeneralRe: Just set the Capacity to avoid a resize. Pin
AspDotNetDev19-Oct-09 7:25
protectorAspDotNetDev19-Oct-09 7:25 
GeneralRe: Just set the Capacity to avoid a resize. Pin
Addy Tas10-Sep-12 20:22
Addy Tas10-Sep-12 20:22 
GeneralMy vote of 1 Pin
Mike Lang19-Oct-09 4:49
Mike Lang19-Oct-09 4:49 
GeneralRe: My vote of 1 Pin
AspDotNetDev19-Oct-09 7:31
protectorAspDotNetDev19-Oct-09 7:31 
GeneralRe: My vote of 1 Pin
Mike Lang20-Oct-09 4:02
Mike Lang20-Oct-09 4:02 
GeneralRe: My vote of 1 Pin
AspDotNetDev20-Oct-09 9:36
protectorAspDotNetDev20-Oct-09 9:36 
GeneralRe: My vote of 1 Pin
Mike Lang20-Oct-09 11:20
Mike Lang20-Oct-09 11:20 
GeneralRe: My vote of 1 Pin
AspDotNetDev20-Oct-09 13:24
protectorAspDotNetDev20-Oct-09 13:24 
Mike Lang wrote:
My concern is more that some will hear that this is "faster than List"

Article:
SlimList uses less memory than List, but it is also slower than List.

If they skip of the first sentence of the conclusion section of the article, that is their fault.

Mike Lang wrote:
they now have a larger component that uses more memory

SlimList will use less memory in most cases. I will leave it to the application developers to decide if the few kilobytes that SlimList compiles to is too much. That is not really my concern. Really, though, the article exists to present a concept in a clear and concise manner. The specific code I attach to the article is only of ancillary importance.

Mike Lang wrote:
setting the Capacity and adding your items is likely to be more efficient


How about a counter example? Bob starts up a game on his Alienware. He plays with max settings and trounces enemies with his mad skillz, because he is a l337 hax0r. Now Jill starts up that same game on her eMachine. She uses lower settings. Given that her computer has very little memory, she does not fire at enemies as rapidly as Bob, because it slows her computer down to have a bunch of bullets on the screen at once. It turns out that using a bunch of bullets makes the list structures used in the program grow so large that she starts to use virtual memory. Luckily, however, the programmer was kind enough to use data structures that grow based on the need of the user. So, while Jill is not going to get as high of a score as Bob, at least she can still play the game, so long as she fires her weapon less often. If the programmer had decided to set the capacity of the data structures involved to the amount expected to be reached by Bob (effectively using a constant and higher amount of memory), then the game might not even be playable by Jill. I made some simplifications in this example, but you get the idea. Dynamic data structures are useful. Different users have different demands.

Counter example 2. Bob is working on a spreadsheet. It is probably the most massive spreadsheet you have ever seen. It has millions of rows in it. Bob is very efficient at his job because he has enough RAM to support such a spreadsheet. Jill, on the other hand, does not have very much RAM. Luckily, she also does not work with very large spreadsheets. Also good for her is that the programmer used a dynamically growing list data structure to store the rows in her spreadsheet. As she adds rows, that list grows larger and larger. If the list had been designed such that the capacity of the list was set to millions of rows to suit Bob's needs, Jill would have been left in the cold using virtual memory, making her spreadsheet application virtually unusable. Here again, you see that list sizes cannot always be predicted and that growing them dynamically is of use. As far as showing this via a "performance test", I am not going to do that. That dynamic data structures are useful is a basic programming concept that most programmers understand. This is not an article for beginners (I marked it intermediate/advanced), so I'm not going to cover such basic concepts.

Mike Lang wrote:
A counter balance in your article about performance testing and the inclusion of tests when it works well and when it does not would get a 5 vote from me.


I feel my article discusses both the pros and the cons of using SlimList very well. It doesn't happen to show performance testing because they are not necessary to convey the concepts, which is the only purpose of the article... not to prove that I'm a excellent programmer who has release a shining product to the world. Performance tests should be done by programmers if they are having performance issues with their applications. I included some graphics to illustrate the concepts presented in the article and adding charts would just be another visualization technique to present those concepts. That would be redundant and I'm not going to do it because some guy has a fetish for charts.

Mike Lang wrote:
would get a 5 vote from me


Think about that. By including a few charts or graphs or whatever, you'd change the vote from 1 to 5? Do you really think my article deserves a 1, or are you just voting that to attempt to strong-arm me into making the changes you think should be done to the article? That is not a proper use of the voting system. You should vote what you think the article deserves; you should not vote an article to get your way.

Visual Studio is an excellent GUIIDE.

GeneralRe: My vote of 1 Pin
Mike Lang21-Oct-09 7:17
Mike Lang21-Oct-09 7:17 
GeneralRe: My vote of 1 Pin
AspDotNetDev21-Oct-09 8:03
protectorAspDotNetDev21-Oct-09 8:03 
GeneralQuestion re: List uses 3x memory Pin
User 304249818-Oct-09 19:34
User 304249818-Oct-09 19:34 
GeneralRe: Question re: List uses 3x memory Pin
AspDotNetDev18-Oct-09 22:24
protectorAspDotNetDev18-Oct-09 22:24 
GeneralNice but confused Pin
Xmen Real 18-Oct-09 0:36
professional Xmen Real 18-Oct-09 0:36 
GeneralRe: Nice but confused Pin
AspDotNetDev18-Oct-09 0:54
protectorAspDotNetDev18-Oct-09 0:54 
GeneralRe: Nice but confused Pin
Xmen Real 18-Oct-09 3:15
professional Xmen Real 18-Oct-09 3:15 
GeneralRe: Nice but confused Pin
AspDotNetDev18-Oct-09 3:25
protectorAspDotNetDev18-Oct-09 3:25 
GeneralRe: Nice but confused Pin
Xmen Real 18-Oct-09 3:35
professional Xmen Real 18-Oct-09 3:35 
GeneralRe: Nice but confused Pin
AspDotNetDev18-Oct-09 3:57
protectorAspDotNetDev18-Oct-09 3:57 
GeneralRe: Nice but confused Pin
AspDotNetDev18-Oct-09 1:02
protectorAspDotNetDev18-Oct-09 1:02 
QuestionStatistics? Pin
Jaime Olivares17-Oct-09 19:51
Jaime Olivares17-Oct-09 19:51 
AnswerRe: Statistics? Pin
AspDotNetDev17-Oct-09 20:04
protectorAspDotNetDev17-Oct-09 20:04 
Generalhere's my 5.. Pin
Rozis17-Oct-09 5:27
Rozis17-Oct-09 5:27 
GeneralRe: here's my 5.. Pin
AspDotNetDev17-Oct-09 5:37
protectorAspDotNetDev17-Oct-09 5:37 
GeneralRe: here's my 5.. Pin
Rozis17-Oct-09 6:27
Rozis17-Oct-09 6:27 
GeneralRe: here's my 5.. Pin
AspDotNetDev17-Oct-09 6:48
protectorAspDotNetDev17-Oct-09 6:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.