|
A horde of cats would be just as effective. We had swarms of locusts in Texas from time to time. The cat went crazy and ate as many locusts as she could. Chasing flies never was of any interest again. I seriously doubt that 100000 cats in one place would behave as well as the ducks or let themselves be led as easily to the battle.
I have lived with several Zen masters - all of them were cats.
His last invention was an evil Lasagna. It didn't kill anyone, and it actually tasted pretty good.
|
|
|
|
|
I've seen the inundation footage on TV. It's like the seventeen year cicadilidae romp up the trunks of trees enmass only the camera footage of this type of devastation looks a bit less noxious. Not sure whether the cicada actually does any eating along it's vertical ascent but all that molting sure looks unpleasant. Especially when all instars have completed and everywhere looks like a massive snake has passed through.
But locust; I've always wondered why some forward thinking Madison Avenue executive hasn't come up with an inexpensive way to manufacture a biomass harvesting machine to corale the lubbers, grind them into human-consumable protein, and feed the hungry. Like all that invasive carp that flies into the faces of stinkpot (that's a less-than-20 ft-runabout with a 50HP outboard) captains who run through the northern branches of the big muddy and rant and rave about it on cellphone videos. As far as an industry as yet tapped, right?
If not eat it, plow it into the ground at least.
|
|
|
|
|
|
Never bothered with Async programming before since I never needed it.
But now I'm having to take care of a weekly delivery of an 80 GB (eighty gigabyte) large XML-file.
The parsing and saving 10 million records to 30 different tables in a database takes more than an hour and there's no simple optimization left to do.
But I only use one kernel in the processor, so let's go parallell, it'll be fun learning. Right?
Easiest part is bulk copying to the database in parallel. Easy enough but it only shaves five minutes from the total time. This is not where the biggest bottleneck is.
The biggest bottleneck is the actual parsing of the XML.
I don't want to rework the whole application into using locks and thread-safe collections so I decide to split the work vertically instead. Add a task for every collection of data.
Also easy enough, now the processor is working close to 100%, but it takes twice as long.
Apparently the creation of tasks has more overhead than the parsing of the data itself.
No shortcuts for me today. Back to the drawing board.
|
|
|
|
|
Oh yes, as soon as your thread count exceeds the core count, you are going to get some slowdown.
You need to be aware that threading is not a "magic bullet" that will solve all your performance woes at a stroke - it needs to be carefully though about and planned, or it can do two things:
1) Slow your machine to a crawl, and make your application considerably slower than it started out.
2) Crash or lock up your app completely.
The reasons why are simple:
1) Threads require two things to run: memory and a free core. The memory will be at the very least the size of a system stack in your language (usually around 1MB for Windows, 8MB for Linux) plus some overhead for the thread itself and yet more for any memory based objects each thread creates; and a thread can only run when a core becomes available. If you generate more threads than you have cores then most of them will spend a lot of time sitting waiting for a thread to be available.
The more threads you generate, the worse problems become: more threads puts more load on the system to switch threads more often and that takes core time as well. All threads ion the system form all processes share the cores in the machine, so other apps and System threads also need their time to run. Add too many, and the system will spend more and more of it's time trying to work out which thread to run and performance degrades. Generate enough threads to exceed the physical memory in your computer and performance suddenly takes an enormous hit as the virtual memory system comes in and starts thrashing memory pages to the HDD.
2) Multiple threads within a process have to be thread safe because they share memory and other resources - which means that several things can happen:
2a) If two threads need the same resource then you can easily end up in a situation where thread A has locked resource X and wants Y, while thread B has locked resource Y and wants X. At this point a "deadly embrace" has occurred and no other thread (nor any other that need X or Y can run ever again.
2b) If your code isn't thread safe, then different threads can try to read and / or alter the same memory at the same time: this often happens when trying to add or remove items from a collection. At this point strange things start to happen up to and including your app crashing.
2c) If resources have a finite capacity - like the bandwidth on an internet connection for example - then bad threading can easily use it all - at either end of the link. If you run out of capacity, your threads will stall waiting for it (and everybody else using the connection will also suffer). If the other end runs out of capacity it may stutter, slow down, crash, or assume that you are a DDOS attack and take precautions.
You can't just go "multithread this" and assume it will work: it's something that needs very, very careful planning.
Think about it this way: if you have a very large bus it is a slow way to get from A to B, but when you average it out over the large number of passengers it's pretty quick. But if you put each passenger in a separate car in theory they can all get there faster - except you are putting a lot more vehicles on the same roads which means more chance of traffic jams, accidents, breakdowns, and so forth. Put too many on the same roads and they get blocked up with cars and nobody can move anywhere because there is a car in their way ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
OriginalGriff wrote: Oh yes, as soon as your thread count exceeds the core count, you are going to get some slowdown.
Didn't even do that.
I'm fully aware of where I wen't wrong. I posted it for netizens of the lounge to have a laugh on my behalf.
In this case the specific problem is that the piece of work is smaller than the cost of creating tasks.
And my error in the bigger picture is that one cannot simply convert a task running in sync to one running in async. It has to be purpose built.
|
|
|
|
|
Jörgen Andersson wrote: netizens of the lounge to have a laugh on my behalf.
We wouldn't do that!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
OriginalGriff wrote: Jörgen Andersson wrote: netizens of the lounge to have a laugh on my behalf.
We wouldn't do that!
No. Programming is hard.
Recursion is for programmers who haven't blown enough stacks yet.
|
|
|
|
|
".. Quote: my error in the bigger picture is that one cannot simply convert a task running in sync to one running in async. It has to be purpose built.
Amen brother - been there, seen that, still feel the pain...
|
|
|
|
|
OriginalGriff wrote: if you have a very large bus it is a slow way to get from A to B, but when you average it out over the large number of passengers it's pretty quick.
...something-something-bandwidth-of-a-station-wagon-carrying-storage-medium...
OriginalGriff wrote: except you are putting a lot more vehicles on the same roads which means more chance of traffic jams, accidents, breakdowns, and so forth. Put too many on the same roads and they get blocked up with cars and nobody can move anywhere because there is a car in their way ...
There's a meme for that...
|
|
|
|
|
Good analysis. Best way to think about it is this: yourself, you cannot really multitask. You can time slice (we used to call this time share), or you can delegate. Everything done internally is really just time slicing, partitioned according to the rules and privileges you assign to processes, and thread within those processes.
ThisOldTony has it right; I am just an echo.
|
|
|
|
|
You maybe have a single CPU to work with?
(a place where data transfer is done in 80GB XML files it seems to be feasible)
"The only place where Success comes before Work is in the dictionary." Vidal Sassoon, 1928 - 2012
|
|
|
|
|
Kornfeld Eliyahu Peter wrote: a place where data transfer is done in 80GB XML files it seems to be feasible
Well, that's governments for you.
|
|
|
|
|
No way! I work with gov and they just bright and smooth... cream-dela-cream...
(today is the 5th week I'm waiting for a version update - still there are personnel to sign it)
"The only place where Success comes before Work is in the dictionary." Vidal Sassoon, 1928 - 2012
|
|
|
|
|
In my case I actually understand them, we're not the only customer on this data, so for them it's just easier to upload a weekly XML file to an ftp-server.
And it's not even my own government in this case.
I don't understand Danish, and Danes take offence if I speak English to them. (Quite rightly so I might add ) So if I want support I need to employ Johnny.
|
|
|
|
|
|
I'm using an XMLReader to chop up the filestream into an XDocument for every record.
Using an XMLReader all the way became to much work, handling null nodes and such stuff.
|
|
|
|
|
I wrote a command line app that imports a NESSUS security scan XML data file - the largest I've seen to date is about 8gb. We import the data into a SQL server database. It's not multi-threaded at all that I recall. I do remember that the file was too big for XDoument to work.
I feel your pain.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
If the parsing can be partitioned into n subproblems, where n is the number of cores, then I would consider creating n daemons and locking each one into its own core. If any of them block, offloading the blocking operations to thread pools might help.
Partitioning the problem will help to reduce semaphore contention and cache collisions.
But I haven't had to populate a large database this way, so I could be full of shite.
|
|
|
|
|
This is exactly what I didn't want to have to learn.
At least all proper databases already handle parallel execution properly.
|
|
|
|
|
yup - learn't the hard way, 1st identify where the program uses it's resources
|
|
|
|
|
More proof that some people have real problems.
So stop complaining people, you could be Jörgen today.
|
|
|
|
|
Ron Anders wrote: So stop complaining people, you could be Jörgen today.
...and have no toilet paper.
Jeremy Falcon
|
|
|
|
|
Isn't it enough if I'm being me?
|
|
|
|
|
I was vaguely reminded of an episode of Home Improvement, where one of the kids got himself in some trouble, so one of the brothers says "I wouldn't wanna be you right now", and the other responds with "I wouldn't wanna be you, ever".
|
|
|
|