std::vector clear function slow

Question

0.00/5 (No votes)

See more:

I have a nested vector as shown:

C++

std::vector<std::vector<std::vector<int>>> tst;

And i am inserting integer values into the nested vector with dimension 1000 x 512 x 512. While clearing the vector using clear() function, it takes a lot of time. Can anyone pls help to solve the issue.

Posted 3-Jan-13 22:41pm

RESMIS

Add a Solution

Comments

CPallini 4-Jan-13 5:03am

Do you realize the amount of memory you are using is huge? How are you calling clear()?

Richard MacCutchan 4-Jan-13 5:04am

What do you mean "solve"? You have 262,144,000 elements to be cleared and a fairly complex structure to be accessed so it will take a finite amount of time. Try reworking your design to use something that does not take so much time.

RESMIS 4-Jan-13 5:13am

clear function is used as:
tst.clear();

Maximilien 4-Jan-13 9:14am

Have you only tested in DEBUG or have you tried in RELEASE ? There will be a HUGE difference between the 2 versions.

3 solutions

Solution 1

It looks that vector::clear ha linear complexity even while containing primitive types, see std::vector::clear at cppreference.com[^] and this Stack Overflow thread[^].

[update]
I made the following test (g++ 4.7)

C++

#include <vector>
#include <time.h>
#include <iostream>

using namespace std;

#define N 40000000
int main()
{

  struct timespec ts[5];

  // step 0
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts[0]);
  vector <int> v;
  v.reserve(N);
  // step 1
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts[1]);

  for (int i=0; i<N; i++)
    v[i] = i;
  // step 2
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts[2]);

  v.clear();

  //step 3
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts[3]);

  for (int i=0; i<N; i++)
    v.push_back(i);

  // step 4
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts[4]);

  for (int i=0; i<5; i++)
  {
    cout << "step " << i << ", " << ts[i].tv_sec << " sec, " << ts[i].tv_nsec << " nsec " << endl;
  }
}

with output:

VB

step 0, 0 sec, 12653759 nsec
step 1, 0 sec, 12798215 nsec
step 2, 0 sec, 752234621 nsec
step 3, 0 sec, 752239235 nsec
step 4, 1 sec, 936184156 nsec

the clear operation doesn't look expensive.
[/update]

Posted 3-Jan-13 23:12pm

CPallini

Updated 4-Jan-13 0:56am

v3

Comments

Philippe Mori 5-Jan-13 11:54am

The main difference is that in your case your case, since the vector contains only primitive types, no destructor are called and only the size is set to 0 (and the memory is still reserved).

In OP code, the vector contains other vectors so the destructor of every <vector<vector<int>> is called which in turn will call the destructor of each <vector<int>. Thus around 512000 blocks of memory (each 2KB) will be freed.

Contrary to managed world, in native world freeing memory is relatively expensive as housekeeping is done immediately for each object being freed.

Given the fact that a lot of memory is used in this scenario, the user should typically add its own logic by reusing memory whenever possible for example.

CPallini 5-Jan-13 14:32pm

Yours is definitely a good point. I would test it. Thank you.

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Philippe Mori · Accepted Answer · 2013-01-04T08:03:00

Solution 3

Your code does not show how you fill your array... and it is not clear what you want to do when you call clear() function. Do you want to have a 0 by 0 by 0 matrix at the end or still have a 1000 x 512 x 512 matrix but filled with 0 ?

The following link might help you find some alternatives:
http://www.google.ca/#q=multidimensional%20array%20class[^]

As mentioned by others, this is a quite large array... and the best way to implement an efficient solution is too know how the class is used. Do you need all those values in memory or only a few of them? Why do you want to clear the object and what do you expect it to be afterward.

For example, it won't make sense to clear the whole structure (to be 0 x 0 x 0) if you fill it again to be 1000 x 512 x 512.

Posted 4-Jan-13 8:03am

Philippe Mori

Comments

RESMIS 4-Jan-13 22:32pm

Sry if i didnt posted enough details about my question. I want all the values in the vector simultaneously. Because before clearing the vector, i am using the vector for multithreading, so different blocks of the vector are accessed simultaneously. The data inside the vector is read from image, ie image pixels from 1000 images of size 512x512 dimension. While loading another set of images the dimensions would change and that is why i'm clearing the vector.

nv3 5-Jan-13 4:49am

So you are storing a set of images, ok. In that case I would recommend to create a vector<Image*> (with Image being some image class) and let your image objects store the 2D matrix of pixels. That would probably make a lot more sense. See the remarks in my amended solution above.

Philippe Mori 5-Jan-13 11:31am

Do you really need all images to load simultaneously?

Also because your are using a lot of large objects and you allocate memory repetitively, you have to be aware that it can cause memory fragmentation and after a few iterations you might not have large enough blocks of memory depending on the memory allocation pattern. Effectively by releasing all memory at once, this should not be a problem since consecutive blocs should be merged.

Does 1000 images is the maximum number of images and 512x512 the maximum image size? Or could you start with the biggest images and reuse that memory for other images.

Does your application is for general use or for internal use. If it is for internal use, then you might consider creating a 64 bit application and run it on a machine that have say at least 4 GB of memory.

I assume that your application is native (and not mixed-mode) and that your are using the release version as otherwise you might have an important drop of performance.

As indicated by mv3, it would make sense to use a class to represent each image as it would make your code more maintainable.

If you use that much memory and cannot work on a small number of images at the same time, then it might make sense to allocate memory for the worst case and reuse that memory for all cases. Assuming that the number of images is fixed, it would make sense if the memory for each image is a single block (1 MB per image for a size 512x512 and 4 bytes per pixel).

nv3 · Accepted Answer · 2013-01-04T02:02:00

As CPallini has already pointed out, you are attempting to clear a 3-dimensional array of int's with 256 million elements, in other words roughly 1 gigabyte of memory!

There are several ways to optimize that:

(a) Use a linear int array instead, and do the index calculations manually (that's what most people in image processing do in their 2D arrays). Then you can clear that memory with a single call to memset, which is more efficient than clearing nested vector<...>. The performance gain will be limited, though. Perhaps a factor of 2 or 3 if you are lucky.

For example:

C++

int* pArray = new int [1024 * 512 * 512];

// clearing the array
memset (pArray, 0, sizeof (int) * 1024 * 512 * 512);

// access to cell (x, y, z)
int cellValue = pArray [x + 512*y + 512*512*z];

(b) Rework your problem, so that you don't need 256 million cells. That would be the best alternative if that is possible.

(c) If your 3D matrix is sparsely filled, divide your huge array up into subcubes of say 16x16x16 cells and allocate only space for those subcubes that contain values unequal to zero. That not only saves a lot of memory, but also a lot of time clearing that memory. Using a power of 2 for the edge size of the subcubes is advantageous as it simplifies the calculation of the subcube index. Cell (x, y, z) will be found in subcube (x>>4, y>>4, z>>4), a very fast operation. Inside that subcube the cell has the index (x & 0xf, y & 0xf, z & 0xf).

This is quite a bit of work, but for huge problems it sure is worth the effort.

AMENDED 5-Jan-2013:

In a comment to another post you let us know that you are storing a set of images. In that case it makes sense to arrange you data like that:

C++

std::vector<Image*> images;

where Image is a class that holds a single image as a 2D array of pixel values. That way you can add or remove images from that set without deleting the entire storage. This is basically making use of technique (c) shown above, and choosing an image as a subcube.

You mentioned that you would be using multi-threading to operate on the images. Then you could use a locking technique that locks an entire image for one of your threads. That is also a lot easier to implement in a vector of images than in your 3D array approach.

Also: If you are storing images you probably don't want to have int as the type for a single pixels. Most likely your pixels are 8-bit quantities, or 3 x 8-bit RGB values. Your image class will probably solve this problem for you. Google for one of the many image classes that are in the public domain and it will take a lot of work off your shoulders.

std::vector clear function slow

3 solutions

Solution 1

Solution 3

Solution 2

Add your solution here

Preview 0

Existing Members

...or Join us