Click here to Skip to main content
15,890,123 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a nested vector as shown:
C++
std::vector<std::vector<std::vector<int>>> tst;

And i am inserting integer values into the nested vector with dimension 1000 x 512 x 512. While clearing the vector using clear() function, it takes a lot of time. Can anyone pls help to solve the issue.
Posted
Comments
CPallini 4-Jan-13 5:03am    
Do you realize the amount of memory you are using is huge? How are you calling clear()?
Richard MacCutchan 4-Jan-13 5:04am    
What do you mean "solve"? You have 262,144,000 elements to be cleared and a fairly complex structure to be accessed so it will take a finite amount of time. Try reworking your design to use something that does not take so much time.
RESMIS 4-Jan-13 5:13am    
clear function is used as:
tst.clear();
Maximilien 4-Jan-13 9:14am    
Have you only tested in DEBUG or have you tried in RELEASE ? There will be a HUGE difference between the 2 versions.

It looks that vector::clear ha linear complexity even while containing primitive types, see std::vector::clear at cppreference.com[^] and this Stack Overflow thread[^].


[update]
I made the following test (g++ 4.7)
C++
#include <vector>
#include <time.h>
#include <iostream>

using namespace std;

#define N 40000000
int main()
{

  struct timespec ts[5];

  // step 0
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts[0]);
  vector <int> v;
  v.reserve(N);
  // step 1
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts[1]);

  for (int i=0; i<N; i++)
    v[i] = i;
  // step 2
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts[2]);

  v.clear();

  //step 3
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts[3]);

  for (int i=0; i<N; i++)
    v.push_back(i);

  // step 4
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts[4]);

  for (int i=0; i<5; i++)
  {
    cout << "step " << i << ", " << ts[i].tv_sec << " sec, " << ts[i].tv_nsec << " nsec " << endl;
  }
}


with output:
VB
step 0, 0 sec, 12653759 nsec
step 1, 0 sec, 12798215 nsec
step 2, 0 sec, 752234621 nsec
step 3, 0 sec, 752239235 nsec
step 4, 1 sec, 936184156 nsec


the clear operation doesn't look expensive.
[/update]
 
Share this answer
 
v3
Comments
Philippe Mori 5-Jan-13 11:54am    
The main difference is that in your case your case, since the vector contains only primitive types, no destructor are called and only the size is set to 0 (and the memory is still reserved).

In OP code, the vector contains other vectors so the destructor of every <vector<vector<int>> is called which in turn will call the destructor of each <vector<int>. Thus around 512000 blocks of memory (each 2KB) will be freed.

Contrary to managed world, in native world freeing memory is relatively expensive as housekeeping is done immediately for each object being freed.

Given the fact that a lot of memory is used in this scenario, the user should typically add its own logic by reusing memory whenever possible for example.
CPallini 5-Jan-13 14:32pm    
Yours is definitely a good point. I would test it. Thank you.
Your code does not show how you fill your array... and it is not clear what you want to do when you call clear() function. Do you want to have a 0 by 0 by 0 matrix at the end or still have a 1000 x 512 x 512 matrix but filled with 0 ?

The following link might help you find some alternatives:
http://www.google.ca/#q=multidimensional%20array%20class[^]

As mentioned by others, this is a quite large array... and the best way to implement an efficient solution is too know how the class is used. Do you need all those values in memory or only a few of them? Why do you want to clear the object and what do you expect it to be afterward.

For example, it won't make sense to clear the whole structure (to be 0 x 0 x 0) if you fill it again to be 1000 x 512 x 512.
 
Share this answer
 
Comments
RESMIS 4-Jan-13 22:32pm    
Sry if i didnt posted enough details about my question. I want all the values in the vector simultaneously. Because before clearing the vector, i am using the vector for multithreading, so different blocks of the vector are accessed simultaneously. The data inside the vector is read from image, ie image pixels from 1000 images of size 512x512 dimension. While loading another set of images the dimensions would change and that is why i'm clearing the vector.
nv3 5-Jan-13 4:49am    
So you are storing a set of images, ok. In that case I would recommend to create a vector<Image*> (with Image being some image class) and let your image objects store the 2D matrix of pixels. That would probably make a lot more sense. See the remarks in my amended solution above.
Philippe Mori 5-Jan-13 11:31am    
Do you really need all images to load simultaneously?

Also because your are using a lot of large objects and you allocate memory repetitively, you have to be aware that it can cause memory fragmentation and after a few iterations you might not have large enough blocks of memory depending on the memory allocation pattern. Effectively by releasing all memory at once, this should not be a problem since consecutive blocs should be merged.

Does 1000 images is the maximum number of images and 512x512 the maximum image size? Or could you start with the biggest images and reuse that memory for other images.

Does your application is for general use or for internal use. If it is for internal use, then you might consider creating a 64 bit application and run it on a machine that have say at least 4 GB of memory.

I assume that your application is native (and not mixed-mode) and that your are using the release version as otherwise you might have an important drop of performance.

As indicated by mv3, it would make sense to use a class to represent each image as it would make your code more maintainable.

If you use that much memory and cannot work on a small number of images at the same time, then it might make sense to allocate memory for the worst case and reuse that memory for all cases. Assuming that the number of images is fixed, it would make sense if the memory for each image is a single block (1 MB per image for a size 512x512 and 4 bytes per pixel).
As CPallini has already pointed out, you are attempting to clear a 3-dimensional array of int's with 256 million elements, in other words roughly 1 gigabyte of memory!

There are several ways to optimize that:

(a) Use a linear int array instead, and do the index calculations manually (that's what most people in image processing do in their 2D arrays). Then you can clear that memory with a single call to memset, which is more efficient than clearing nested vector<...>. The performance gain will be limited, though. Perhaps a factor of 2 or 3 if you are lucky.

For example:
C++
int* pArray = new int [1024 * 512 * 512];

// clearing the array
memset (pArray, 0, sizeof (int) * 1024 * 512 * 512);

// access to cell (x, y, z)
int cellValue = pArray [x + 512*y + 512*512*z];

(b) Rework your problem, so that you don't need 256 million cells. That would be the best alternative if that is possible.

(c) If your 3D matrix is sparsely filled, divide your huge array up into subcubes of say 16x16x16 cells and allocate only space for those subcubes that contain values unequal to zero. That not only saves a lot of memory, but also a lot of time clearing that memory. Using a power of 2 for the edge size of the subcubes is advantageous as it simplifies the calculation of the subcube index. Cell (x, y, z) will be found in subcube (x>>4, y>>4, z>>4), a very fast operation. Inside that subcube the cell has the index (x & 0xf, y & 0xf, z & 0xf).

This is quite a bit of work, but for huge problems it sure is worth the effort.

AMENDED 5-Jan-2013:

In a comment to another post you let us know that you are storing a set of images. In that case it makes sense to arrange you data like that:
C++
std::vector<Image*> images;

where Image is a class that holds a single image as a 2D array of pixel values. That way you can add or remove images from that set without deleting the entire storage. This is basically making use of technique (c) shown above, and choosing an image as a subcube.

You mentioned that you would be using multi-threading to operate on the images. Then you could use a locking technique that locks an entire image for one of your threads. That is also a lot easier to implement in a vector of images than in your 3D array approach.

Also: If you are storing images you probably don't want to have int as the type for a single pixels. Most likely your pixels are 8-bit quantities, or 3 x 8-bit RGB values. Your image class will probably solve this problem for you. Google for one of the many image classes that are in the public domain and it will take a lot of work off your shoulders.
 
Share this answer
 
v2
Comments
Philippe Mori 4-Jan-13 13:46pm    
I won't recommend solution (a). This can be done with a single vector instead of nested ones. While it will be a bit slower, it is more maintainable (no explicit memory management) except that you should reserve the memory to avoid memory reallocation.
nv3 4-Jan-13 13:54pm    
Thanks for the comment Philippe. The example code for (a) was just a demonstration. One could equally well write it with an STL::vector<>. Normally, I would go with vector<> as well. But then again, when it comes to allocating a GB of memory, I'd prefer to take things into my own hands. Thanks for 4.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900