The Lounge - CodeProject

First Prev Next

Re: Adventures in Async

Garth J Lancaster29-Jun-20 1:04

Garth J Lancaster

29-Jun-20 1:04

yup - learn't the hard way, 1st identify where the program uses it's resources

Re: Adventures in Async

Ron Anders29-Jun-20 3:16

Ron Anders

29-Jun-20 3:16

More proof that some people have real problems.

So stop complaining people, you could be Jörgen today.

Re: Adventures in Async

Jeremy Falcon29-Jun-20 4:52

Jeremy Falcon

29-Jun-20 4:52

Ron Anders wrote:
So stop complaining people, you could be Jörgen today.

...and have no toilet paper.

Jeremy Falcon

Re: Adventures in Async

Jörgen Andersson29-Jun-20 4:57

Jörgen Andersson

29-Jun-20 4:57

Isn't it enough if I'm being me?

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

dandy7229-Jun-20 5:33

dandy72

29-Jun-20 5:33

I was vaguely reminded of an episode of Home Improvement, where one of the kids got himself in some trouble, so one of the brothers says "I wouldn't wanna be you right now", and the other responds with "I wouldn't wanna be you, ever".

Re: Adventures in Async

Stuart Dootson29-Jun-20 3:45

Stuart Dootson

29-Jun-20 3:45

I was going to say - size of the work done in each task is key...

But the underlying technology can also have an effect, by reducing the cost of task creation. If you're using a work queue on top of a thread pool, you're not creating a thread for each task, you're pushing/popping tasks on and off a queue.

I created a little tool to detect duplicate files using that sort of parallelism. It contains two main areas of parallelism:

The file search library that I use adds a new task for each directory it sees. Each task processes just the files that are immediate children of the directory the task was created for.
The detection of duplicates is split so that each task hashes a group of files that have the same size. This is performed using a data parallelism library, which makes parallelising things very easy.

The amount of speedup I get isn't anywhere near the number of processor cores in use (I get a factor of just over two speedup on an eight core machine), but I think that the amount of IO being done serialises the processing to a certain degree. Benchmarking ripgrep, another tool that uses similar parallelism, shows that running with 8 threads (on 8 logical/4 physical cores) is just over 3x faster than using 1.

Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p

Re: Adventures in Async

abmv29-Jun-20 4:31

abmv

29-Jun-20 4:31

why are u even parsing xml files and that too 80gb !!! and then saving it to the database !!! .. u could try to use the sql server bulk import tools to do this and avoid programming such stuff all together...

Caveat Emptor.

"Progress doesn't come from early risers – progress is made by lazy men looking for easier ways to do things." Lazarus Long

Re: Adventures in Async

Jörgen Andersson29-Jun-20 4:54

Jörgen Andersson

29-Jun-20 4:54

Because I want to have the data extracted into normalized tables.

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

abmv29-Jun-20 5:06

abmv

29-Jun-20 5:06

if u have sql server there is SSIS anyway... Importing XML documents using SQL Server Integration Services

Caveat Emptor.

"Progress doesn't come from early risers – progress is made by lazy men looking for easier ways to do things." Lazarus Long

Re: Adventures in Async

Jörgen Andersson29-Jun-20 5:13

Jörgen Andersson

29-Jun-20 5:13

I've missed out on that possibility completely.

A bit late now, but I'll take a look at it anyway. Thumbs Up | :thumbsup:

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

Jörgen Andersson29-Jun-20 5:23

Jörgen Andersson

29-Jun-20 5:23

I think I see the reason why I missed out on that possibility, it does not seem to exist on SQL Server 2012.

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

abmv29-Jun-20 5:40

abmv

29-Jun-20 5:40

if its the dev env u have you can run the sql server setup and select the components needed to get SSIS services and vs based client tools.is on the iso or dvd etc.. .. also there is OPENROWSET Simple way to Import XML Data into SQL Server with T-SQL .....

Caveat Emptor.

"Progress doesn't come from early risers – progress is made by lazy men looking for easier ways to do things." Lazarus Long

Re: Adventures in Async

Jörgen Andersson29-Jun-20 5:24

Jörgen Andersson

29-Jun-20 5:24

I think I see the reason why I missed out on that possibility, XMLSource does not seem to exist on SQL Server 2012.

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

Jeremy Falcon29-Jun-20 4:51

Jeremy Falcon

29-Jun-20 4:51

Welcome to the cool club though. Ladies can't resist an async coder. #science

Jeremy Falcon

Re: Adventures in Async

Jörgen Andersson29-Jun-20 4:55

Jörgen Andersson

29-Jun-20 4:55

That's seriously the best answer today. Big Grin | :-D

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

Jeremy Falcon29-Jun-20 4:57

Jeremy Falcon

29-Jun-20 4:57

Jeremy Falcon

Re: Adventures in Async

dandy7229-Jun-20 5:35

dandy72

29-Jun-20 5:35

Jeremy Falcon wrote:
Ladies can't resist an async coder.

What for...certainly not her and her sister...

Re: Adventures in Async

Nelek29-Jun-20 8:54

Nelek

29-Jun-20 8:54

I can't help, but reading "parse" in the body of the message...

This is clearly a case for... HONEY THE @CODE-WITCH tatatataaaaaaa Laugh | :laugh:

M.D.V.

If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.

Re: Adventures in Async

honey the codewitch29-Jun-20 10:02

honey the codewitch

29-Jun-20 10:02

are you sure the bottleneck isnt disk i/o?

Real programmers use butterflies

Re: Adventures in Async

Jörgen Andersson29-Jun-20 21:08

Jörgen Andersson

29-Jun-20 21:08

Yes, just to make sure.
I've made test runs just reading an ID from every record which goes twice as fast, and that's on a slow HDD here at home.
And when I move this to a server the disks will be considerably faster.

Wrong is evil and must be defeated. - Jeff Ello

Re: Adventures in Async

Padanian29-Jun-20 19:59

Padanian

29-Jun-20 19:59

40GB of your 80GB XML file are tags. So much for the overhead.
The suggestion is to worldwide drop all markup languages (XML, JSON and similar shiite)

Re: Adventures in Async

kalberts29-Jun-20 20:15

kalberts

29-Jun-20 20:15

But compact binary formats are almost impossible to patch up using vi. Linux guys will feel completely lost!

Re: Adventures in Async

Padanian29-Jun-20 20:18

Padanian

29-Jun-20 20:18

And?

Re: Adventures in Async

kalberts30-Jun-20 0:47

kalberts

30-Jun-20 0:47

Lots of Linux guys working in Windows environments. Lots of them hate it, they do it just to earn money so they can pay for their home computers to contribute to Linux based open source projects in their spare time. And they spend a lot of energy on bithcing about things not being exactly as they are used to in the Linux world.

My comment was "based on a true story". I made one application storing a fairly complex persistent data structure in a binary format. This was met with heavy critisism: What if that data structure becomes inconsistent - how can we fix up the inconsistencies when it is not in a readable format? I guess I wasn't too polite when answering them that one major reason for not using a readable format was to prevent them from poking into the file with vi, introducing inconsistencies.

In this system I am working on now: It is a Windows desktop application, but there is a function for converting all file system paths to Unix style forward slash path separators, and a handful utility functions that fails if you submit a DOS/Windows style path with backwards slashes. Forward slashes is the only "correct" path format, they claim - DOS/Windows was simply wrong until they started accepting the correct format. So the (Windows) users of this program must simply accept that when using the conventions of their OS, they are simply wrong.

In an earlier project, the Linux mafia forced me to make special adaptations in my (very) Windows-specific utililty: They inisisted on running it, in their shell based batch jobs, from a Linux-adapted command shell that enforced case sensitive environment symbols. They make use of it, too: Their jobs started crashing, and it boiled down to my utility treating symbols differing only in case as synonyms, while they were distinct in their jobs.

In my current project, one of the first thing I did was to replace case sensitive file name comparisons with case insensitive ones. It was argued, "But cmake always uses CMakeLists.txt, with exactly that casing! There is no need to do a case insensitive comparison!" Well... Why did the program then barf? Someone wrote CmakeLists.txt, and the program just failed, because it didn't find the file.

I would have tolerated this a lot more if it wasn't for the constant bitching from the Linux mafia about Windows users refusing to learn anything new, but cling to Windows ways of doing things (when working under Windows) rather than learning the way these wonderful command-line utilities ported from the wonderful world of free and unsupported software expects you to put everything in a loooong command line. This bitching about unwillingness to learn is nothing new: I have heard it constantly repeated for at least 15-20 years, in numerous different environments.

I recently discovered that the Compare plugin to Notepad++ was incapable of comparing two generated build jobs: The command lines invoking gcc was in excess of 3800 characters. Looking up the documentation for the generator, I found explicit warnings about Windows incapable of handling command lines exceeding 8 Ki characters; this could cause problems (but Npp Compare obviously has a far lower limit). Hooray for command line interfaces, where every detail is available at your fingertips, not in a silly screen form!

Now, a lot of newer Linux born utilities do use binary formats - but the Linux mafia always have explanations for this very special case where it is justified. For us who have lived through several wars, it this interesting to see how a lot of arguments that were boasted as super-essential, a few years after the war was won is laid silently down and more or less replaced by what the loosing side was promoting, although usually with a twist, so it will not be recognized.

Let me give one example of this: Packet routing. One of the fundamental strengths of IP as compared to e.g. ATM/FR/OSI-NP is that if a link is broken or congested, packets can select a different route: Each packet contains the full address and is, in principle, routed independently, and can follow any route to the destination. Connection oriented protocols assume that all packets follow the same route; a link/physical failure requires a full connection reestablishment. ... Yeah, right. In today's Internet, every IP packet finds its own path. Right. You chop of an international trunk fiber, and no router anywhere in the world requires manual intervention for having its routing tables changed; that goes by itself, automatically. Believe the old myths, if you like.

There are several other examples, like Internet and US phone guys insisting on inband signaling (reducing a 64 kbps line to 56 kbps data capacity), while Europeans favor OOB singaling, both in phone systems and data networks. Then when the SIP protocol for establishing IP phone connections (or other kinds of connection) was defined, all that heavy critisism of OBB signaling was kept low - SIP is OOB signaling in a nutshell.

There is no way to completely escape poor solutions promoted by the Linux and Internet communities (which has a large degree of overlap); we have to live with it, in spite of extremely poor tools, poor user interface and high overhead (especially when it comes to space requirements). But when I make Windows specific tools, aimed at another target audience that Linux hackers, I prefer to do things in better ways.

Re: Adventures in Async

Padanian30-Jun-20 0:53

Padanian

30-Jun-20 0:53

This should be published to ever lasting memory. My congrats.

Last Visit: 31-Dec-99 18:00 Last Update: 26-Jun-24 11:21

Refresh

ᐊ Prev 1...5064 5065 5066 506750685069 5070 5071 5072 5073 Next ᐅ

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Welcome to the Lounge