|
I just got done doing broad profiling. I haven't instrumented my code for specific profiling yet because I hadn't identified the bottleneck until I wrote that post.
I've been punting it until I got some other features implemented that needed doing (I needed them in there so I could benchmark them as well) but that's my next thing, because I just finished adding those features I mentioned just now.
Real programmers use butterflies
|
|
|
|
|
Switching on characters is killing performance.
switch(ch) {
case '\t': m_column+=TabWidth; break;
case '\n': ++m_line;
case '\r': m_column=0;
break;
default:
++m_column;
break;
}
This loses me 7-8MB/s in throughput on my machine pretty consistently. The problem is I switch on characters everywhere as this is a JSON parser. I can reduce some of my comparisons but not a lot of them because of the way my code is structured. The only other thing I can think of right now is building my own jump table schemes but I really don't want to do that so I'm trying to come up with something else.
Real programmers use butterflies
|
|
|
|
|
If switching on a char takes an inordinate amount of time, I'd sure be curious to know how your compiler does it. It ought to be the most efficient way of switching there is: A jump table indexed by the switch variable, loading the program counter. In the old days, when CPUs were slow, that was the only way to do it. (The first Pascal compiler I used could only take alternatives spanning a 256-value range, because that was the largest jump table it could generate.)
Modern languages are far more flexible in their case expressions, often requiring the compiler to generate code like for an "if - elseif - elseif ... else" sequence. Maybe that is what your compiler has done here, maybe even generating "elseif"s for every char value, rather than collecting the "default" in an "else". If the compiler is scared of big jump tables, and therefore uses elseif-constructions, it rather should realize that this makes the code size grow far more than the size of a jump table!
I am just guessing! But it sounds crazy that an indexed jump would kill your performance; it just doesn't sound right. I would look at the generated code to see what happens. If you can't make the compiler do an indexed jump, maybe you are better off writing it in longhand, building the jump table from labels .
I guess that writing it explicitly as "if - elseif..." would be better. Then you could also do the most common case first, so that only a single test is required: "if (ch > '\t') {...}".
I hate it when compilers force me to do the job that should be theirs, but maybe you have to, in this case!
|
|
|
|
|
That's pretty much where I'm at. I'm using gcc, which should be pretty good about optimizing. What gets me is it's not any faster whether I'm using no switches, or -g
Real programmers use butterflies
|
|
|
|
|
Don't I feel stupid.
Approx stack size of local JSON stuff is 160 bytes
Read 1290495 nodes and 20383269 characters in 416.631000 ms at 45.603904MB/s
Skipped 1290495 nodes and 20383269 characters in 184.131000 ms at 103.187405MB/s
utf8 scanned 20383269 characters in 146.422000 ms at 129.761921MB/s
raw ascii i/o 20383269 characters in 58.902000 ms at 322.569692MB/s
raw ascii block i/o 19 blocks in 3.183000 ms at 5969.211436MB/s
Much better.
I was using the wrong gcc options. I'm used to msvc
Real programmers use butterflies
|
|
|
|
|
Are you familiar with the Compiler Explorer[^] ?
It's a very useful tool for looking at the assembly generated by gcc and other compilers
|
|
|
|
|
I like to do broad, algorithmic optimizations before I try to outsmart the compiler.
I've gotten at least a 3 times speed improvement by changing my parsing to use strpbrk() over a memory mapped file.
Approx stack size of local JSON stuff is 176 bytes
Read 1231370 nodes and 20383269 characters in 268.944000 ms at 70.646677MB/s
Skipped 1231370 nodes and 20383269 characters in 35.784000 ms at 530.963559MB/s
utf8 scanned 20383269 characters in 78.679000 ms at 241.487563MB/s
raw ascii i/o 20383269 characters in 58.141000 ms at 326.791765MB/s
raw ascii block i/o 19 blocks in 3.369000 ms at 5639.655684MB/s
The bold is the relevant line here. That's doing a parse of the bones of the document (looking for {}[]") in order to skip over it in a structured way. That style of parsing is used for searching, for example, when you're trying to find all ids in a document. It's using the mmap technique i mentioned.
Here's snagging all "id" fields out of a 20MB file and reading their values.
Approx stack size of local JSON stuff is 152 bytes
Found 40008 fields and scanned 20383269 characters in 34.664000 ms at 548.119086MB/s
The bytes used stuff is roughly how much memory the query takes - including the sizes of the JsonReader and LexSource member variables.
Real programmers use butterflies
|
|
|
|
|
UTF8?
Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger
|
|
|
|
|
Yeah, it's a unicode encoding format. Most characters are one byte so it's ascii-ish except for the extended character range.
However, it's a bit involved to decode it.
Implementing the JSON spec requires UTF-8 support.
Real programmers use butterflies
|
|
|
|
|
Well, it triples the time needed.
I would make it an option to choose ansi or ascii in the case where performance is an issue, but encoding isn't
Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger
|
|
|
|
|
The only issue with that is I'm trying to make it spec compliant, but i considered making it an option. I may yet, as it's quite a bit faster, but first i want to see how quick i can get the utf8 support.
It won't triple the time needed if I can process 4 bytes at a time using simd
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: but first i want to see how quick i can get the utf8 support.
Obviously!
Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger
|
|
|
|
|
Did that to myself just the other day; a process that ran in about 70 ms started clocking at almost a minute. Async; no sync; didn't matter.
Forgot to disable debug output in a critical routine: the overhead was "huge" (in debug mode).
Had me going for a while; the (VS) diagnostics showed the cpu profile was not as expected.
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food
|
|
|
|
|
What's frustrating is this simple case statement loses me about 7-8MB/s in throughput on something that currently tops out at about the low 60s on a good day.
switch(ch) {
case '\t': m_column+=TabWidth; break;
case '\n': ++m_line;
case '\r': m_column=0;
break;
default:
++m_column;
break;
}
Real programmers use butterflies
|
|
|
|
|
Take a look at the Compiler Explorer, and see if the assembler output make sense. Maybe an if statement would produce better results, though it would be a less aesthetically pleasing, IMHO.
Keep Calm and Carry On
|
|
|
|
|
I'm a dunce. I had my compiler options set wrong.
This is my latest, somewhat fixed output, but it could be a lot faster. I want utf8 in the GB/s range or at least spitting distance of it on my machine
Approx stack size of local JSON stuff is 176 bytes
Read 1290495 nodes and 20383269 characters in 272.591000 ms at 69.701494MB/s
Skipped 1290495 nodes and 20383269 characters in 118.066000 ms at 160.926939MB/s
utf8 scanned 20383269 characters in 91.398000 ms at 207.882011MB/s
raw ascii i/o 20383269 characters in 57.443000 ms at 330.762669MB/s
raw ascii block i/o 19 blocks in 3.024000 ms at 6283.068783MB/s
I just tried a branchless utf8 decoding routine but it proved to be slower than my original version. However, it's closer to something that could be converted to simd instructions so i'm exploring that more.
Real programmers use butterflies
|
|
|
|
|
I suspect that the utf8 scanning is using fgetc underneath to return one character at a time. This would greatly simplify the implementation of the utf8 scanner.
|
|
|
|
|
What I use under the covers depends on what kind of LexSource you use. Mainly I use memory mapped files now, for speed, but I'm implementing one using fread and buffered access and we'll see how that stacks up.
I'm very nearly breaking 600MB/s of JSON searching on my machine.
Real programmers use butterflies
|
|
|
|
|
For the past 4 months or so, I have tried to keep my brain from atrofying by working on complex issues. I am approaching 80 and dementia in all forms is one of the few things that really scares me. When my daughter was struggling with a course in elementary Java, I had to quickly learn elementary Java to help her.
Well, Java hooked me, and now I am well into Javafx with fxml and far, far beyond her elementary course. I found Java to be quite similar to C#, but at the same time, it had significant differences.
One difference that drove me nuts several times, is the simple act to compare strings. In C# I must have have done the comparison like: if (stringA == stringB) { } thousands of times. But try to do that in Java, ooh boy! The compiler will just ignore the statement and skip to the next statement. No exception, no error, no nothing. It just ignores the line. The results can be chaotic with absolutely no indication where the error occurred! The proper syntax in Java is: if ( stringA.equals(stringB)) { }
And if any of you readers tell me I deserve to suffer for trying to work in Java: I will scream!
One other thing: I started out using the Eclipse IDE for java, but soon found I preferred the IntelliJ IDE. In fact IntelliJ is better than Visual Studio in several respects. But VS admittedly has its moments in the sun.
Get me coffee and no one gets hurt!
modified 26-Dec-20 8:46am.
|
|
|
|
|
I'm with you on that. I had to learn Java during my last years working for a living, and got to like it. Also I used eclipse extensively and, once I understood its quirks, I found it also easy to work with. I do very little Java these days so just use Visual Studio Code as an editor.
|
|
|
|
|
One more way of preventing dementia and other such related issues is -
Composing poetry, in a language known to you.
I have attempted composing poems in Sanskrit and Kannada (my most familiar languages), and it really occupies the mind and brain, especially hunting for words which fit into the rhyme and rhythm of the poem. On one occasion, I spent close to three hours at a stretch, and composed a poem with 15 verses, two lines per verse, in Sanskrit.
|
|
|
|
|
Before pandemic I volunteered at local VA hospital and would have to go to wards to collect patients and the people with dementia broke my heart.
I'm not sure how many cookies it makes to be happy, but so far it's not 27.
JaxCoder.com
|
|
|
|
|
My wife's grandmother and grandfather both passed away after long bouts with dementia. They lost their dignity and then themselves. It made me resolve that if I were ever diagnosed with it, I would check out on my own power before I ever put my child through that.
Software Zen: delete this;
|
|
|
|
|
I agree, its ahell of a thing to live with and put others through.
I'm not sure how many cookies it makes to be happy, but so far it's not 27.
JaxCoder.com
|
|
|
|
|
Oz is just toying with the idea of assisted death, hopefully they bring it in before I need it, we don't have access to weaponry (the only good reason to be able to get a gun IMHO).
Never underestimate the power of human stupidity -
RAH
I'm old. I know stuff - JSOP
|
|
|
|