Okay, i am totally new to threading, I spent the last days going through online tutorials and decided that "parallel.invoke" or Tasks are the way to go for my problem.
The project I am working one is still an implementation of constructive solid geometry, and I thought that I could have massive performance gains by adding parallelism.
My algorithm has essentially these parts:
The A and B in the following text indicate from which 3D-Model the polygon came as CSG seeks to allow boolean operations on 3D Objects
- Determine lines of intersection from polygonA to polygonB and the other way round.
This creates a set of segments (for each polygon) in two list.
Here is some code of the sequential solution:
int facesB = modelB.Triangle.Count;
int facesA = modelA.Triangle.Count;
for (int i = 0; i < facesB; i++)
{
for (int j = 0; j < facesA; j++)
{
DetermineIntersection(j, i);
}
}
When intersecting two boxes this always takes around 30ms. I want it faster, so my thought was to split up the first for loop and using parallel.invoke.
int facesB = modelB.Triangle.Count;
int facesA = modelA.Triangle.Count;
int prlfB = (int)facesB / 2;
Parallel.invoke(()=>
{
for (int i = 0; i < prlfB; i++)
{
for (int j = 0; j < facesA; j++)
{
DetermineIntersection(j, i);
}
}
}, ()=>
{
for (int i = prlfB; i < facesB; i++)
{
for (int j = 0; j < facesA; j++)
{
DetermineIntersection(j, i);
}
}
}
);
I know that there is a certain overhead when working with threads, and when I split two boxes (a total of 24 polygons) I think I don't need to parallelize things.
The parallel.invoke was only slightly slower, so I thought if i raised the poly count (using Spheres with each 500 triangles instead of boxes (12 each)) I could really gain some speed. Nada. The sequential solution is nearly two times faster when working with higher polycounts.
Here is the (pseudo) code of DetermineIntersection
public void DetermineIntersection(int mApoly, int mBpoly)
TriA = modelA.Triangles[mApoly];
TriB = modelB.Triangles[mBpoly];
for (int i=0; i<3; i++)
{
IntersectRayTrianglePlane(TriB.side(i), TriA, out intersectionSegment)
if (success==true)
{
lock(listOfCutsAgainstA)
{
listOfCutsAgainstA.Add(intersectionSegment)
}
}
}
- Next step is: Do some Triangulation to make sure that the previously found lines of intersection do not intersect any triangle anymore
Here again: A perfekt task for parallelizing, I do not even need to lock anything because they never write to the same resources.
But let's ignore those steps and focus on the one I mentioned.
It was somehow successful to use ParameterizedThreadStart, which was indeed faster, after a certain number of polygons was involved (as I expected it).
The tons of tutorials said that the new .Net 4.0 TPL is faster and cleaner and more flexible, so why do I get such strange results?
Same problem occurs with tasks.
When I use the Jordan Curve Theorem in a recursive algorithm to determine if a point is inside a 3D-Model, it takes not less than 10.000 triangles before there is any benefit to using Tasks instead of a normal, non threaded solution.
What am I doing wrong?
Any help appreciated. I hope I made everything as clear as possible.
Thanks in advance :rose: