Click here to Skip to main content
15,886,676 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
See more:
Okay, i am totally new to threading, I spent the last days going through online tutorials and decided that "parallel.invoke" or Tasks are the way to go for my problem.

The project I am working one is still an implementation of constructive solid geometry, and I thought that I could have massive performance gains by adding parallelism.
My algorithm has essentially these parts:
The A and B in the following text indicate from which 3D-Model the polygon came as CSG seeks to allow boolean operations on 3D Objects
- Determine lines of intersection from polygonA to polygonB and the other way round.
This creates a set of segments (for each polygon) in two list.

Here is some code of the sequential solution:
int facesB = modelB.Triangle.Count;
int facesA = modelA.Triangle.Count;

for (int i = 0; i < facesB; i++)
{
    for (int j = 0; j < facesA; j++)
    {
        DetermineIntersection(j, i);
    }
}

When intersecting two boxes this always takes around 30ms. I want it faster, so my thought was to split up the first for loop and using parallel.invoke.

int facesB = modelB.Triangle.Count;
int facesA = modelA.Triangle.Count;
int prlfB = (int)facesB / 2;
Parallel.invoke(()=>
{

  for (int i = 0; i < prlfB; i++)
  {
    for (int j = 0; j < facesA; j++)
    {
        DetermineIntersection(j, i);
    }
  }
}, ()=>
{
  for (int i = prlfB; i < facesB; i++)
  {
    for (int j = 0; j < facesA; j++)
    {
        DetermineIntersection(j, i);
    }
  }
}
);

I know that there is a certain overhead when working with threads, and when I split two boxes (a total of 24 polygons) I think I don't need to parallelize things.
The parallel.invoke was only slightly slower, so I thought if i raised the poly count (using Spheres with each 500 triangles instead of boxes (12 each)) I could really gain some speed. Nada. The sequential solution is nearly two times faster when working with higher polycounts.

Here is the (pseudo) code of DetermineIntersection
public void DetermineIntersection(int mApoly, int mBpoly)
  TriA = modelA.Triangles[mApoly];
  TriB = modelB.Triangles[mBpoly];
  // Determine intersection of the sides of Triangle B
  // against the plane of triangle A, and check if a valid intersectionSegment
  // can be returned
  for (int i=0; i<3; i++)
  {
    IntersectRayTrianglePlane(TriB.side(i), TriA, out intersectionSegment)
    if (success==true)
    {
       lock(listOfCutsAgainstA)
       {
         listOfCutsAgainstA.Add(intersectionSegment)
       }
    }
  }            
  // And now the same code where it checks if sides of TriangleA intersect with the plane of triangleB



- Next step is: Do some Triangulation to make sure that the previously found lines of intersection do not intersect any triangle anymore

Here again: A perfekt task for parallelizing, I do not even need to lock anything because they never write to the same resources.

But let's ignore those steps and focus on the one I mentioned.
It was somehow successful to use ParameterizedThreadStart, which was indeed faster, after a certain number of polygons was involved (as I expected it).
The tons of tutorials said that the new .Net 4.0 TPL is faster and cleaner and more flexible, so why do I get such strange results?

Same problem occurs with tasks.

When I use the Jordan Curve Theorem in a recursive algorithm to determine if a point is inside a 3D-Model, it takes not less than 10.000 triangles before there is any benefit to using Tasks instead of a normal, non threaded solution.

What am I doing wrong?

Any help appreciated. I hope I made everything as clear as possible.
Thanks in advance :rose:
Posted
Updated 17-Feb-11 1:51am
v2

1 solution

As I see it you the expected results - 30ms isn't all that long :)

This may partly explain why you are not seeing the expected perfomance gains:
http://www.profimatics.de/products/intime/_manuals/windows.nt.thread.latency.pdf[^] - while the article is quite old, this is still something we have to live with.

So parallel execution of really small tasks does not make sense ...

Update
It seems like implmenting your own class based on System.Threading.Tasks.TaskScheduler[^] can improve execution times, but the work required doesn't seem to justify the effort, compared to using ParameterizedThreadStart, unless you can reuse the implmentation.


Regards
Espen Harlinn
 
Share this answer
 
v2
Comments
Jan Wolf 17-Feb-11 7:43am    
Yes, you are right.
But raising the polycount makes the operation significantly slower.
The 30ms is only for two boxes, 24 triangles. That is not all that much.

When i use the parameterized threads and an increased polycount the threaded solution is indeed faster. Significantly if there are a few hundred triangles involved.

My first question is not how to speed this up, because I have a solution with the parameterized threads, but why parallel.invoke and tasks are so slow compared to ParameterizedThreadStart.
The second is: The parallel.invoke is slower, even when thousands of triangles are involved. The sequential solution takes 600ms, parallel.invoke a little more than 1000ms

But the tutorials I read suggested using the TPL, because it does not involve as much overhead, and replaces the old Threadpool thing, and was easier to use (I agree) while being faster.

Btw: 30ms is too long for this kind of operation, but the problem can be solved by optimizing some of my math routines. But thats not the question.
I will have a look at the paper you send me.
I am not expecting a doubled performance because I split this thing in two threads, but I suspect threading, no matter if "normal" Threads or the TPL thing, to be faster. But they seem to have a MASSIVE overhead, larger than "normal" Threads, which is contradictory to what I have read.
Espen Harlinn 17-Feb-11 8:40am    
If you look at the API, there seems to be a number of provisions for various circumstances regarding the execution of tasks. That may explain why execution using parameterized threads works significantly better than parallel.invoke. There is also an overhead from the initialization of the threadpool. Try repeated execution of your code in a loop – you will, hopefully, see some improvement, but I’m not really sure. I would guess that parallel.invoke requires tasks to take a significant amount of time before the overhead becomes negligible. Remember, we are living in a world where a certain marketing department determines the truth – no matter what your test actually shows :) Play around with queries testing EF4 vs SQL against MS SQL Server, and you’ll bound to get a good laugh – of cause the MS SQL team is better at doing queries than the EF4 linq team …
Jan Wolf 18-Feb-11 2:01am    
You might be right,
what confuses me especially is that the parallel.invoke solution only takes around 15ms longer when I am dealing with two boxes which is okay I guess, since like you said 30ms is not a long time. It makes no sense whatsoever, that when the only thing I increase is triangle count (the code in parallel.invoke stays the same, as posted above), this solution still takes longer, and the difference seems to increase almost linear.

I am going to accept your answer for now, still hoping someone knows how I could solve this, because parallel.invoke / tasks are more readable and elegant (visually). Having to have an own object just to pass parameters can get quite annoying, and also (especially in the "point in 3D object" algorithm) I have the need of return values, which is almost too simple when using tasks.

Thanks Espen
Sergey Alexandrovich Kryukov 17-Feb-11 19:22pm    
This is very true and non-trivial, I like the answer, my 5.
--SA
Nish Nishant 19-Feb-11 10:10am    
Voted 5!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900