|
Wow, that is crazy it took 3 tries. You'd think Dell would investigate. I guess it's just a write-off.
|
|
|
|
|
Two rants in an hour. Geez
Turns out (from the looks of things, unless i messed something up) strpbrk() is poorly optimized in Microsoft's VCLib.
I'm only getting <500MB/s in JSON(C++) on a windows machine compiled with cl.exe:
cl.exe /Zi /EHsc /nologo /GL /D "NDEBUG" /O2 /Fe:main.exe JSON-CPP\src\main.cpp
(I never use cl.exe from the command line so maybe it's wrong?)
On linux on my old machine that was 20 times slower than this one i was getting almost 600MB/s on a standard HDD on an old i5. But that was linux. This machine is a Ryzen 7 on an NVMe drive. On windows.
And yet? WTH!
Does anyone know if GCC will work on Windows without some virtual env like MiniGW installed?
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: Does anyone know if GCC will work on Windows without some virtual env like MiniGW installed? I thought they brought up WSL for such things?
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
That's the opposite thing, I think?
This would be running a linux based app on a windows machine, not the other way around.
Real programmers use butterflies
|
|
|
|
|
You asked about the GCC not about the executable app you are programing
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
I meant GCC. my app should compile just about anywhere - even on 8-bit machines with no real operating system to speak of.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: I'm only getting <500MB/s in JSON(C++) on a windows machine compiled with cl.exe:
I'm gonna play dumb:
What are you getting as JSON files, that is so large that processing half a GB of it per second is a problem?
I'm old-school, so I'm all for optimization. However to put things into perspective, 500MB is something that would take my internet connection a significant amount of time to download. Having some C++ code chew on a JSON file that large per second isn't necessarily what I'd be terribly concerned about.
|
|
|
|
|
Searching and uploading bulk data, from JSON files, often in line delimited JSON form (each json doc on its own line in one big file)
Real programmers use butterflies
|
|
|
|
|
Still playing dumb:
And JSON is the correct mechanism for this?
|
|
|
|
|
Ask the people that produce these files.
Do you plan on insisting that they should move to a binary format?
Good luck!
Real programmers use butterflies
|
|
|
|
|
Like I said, I was just playing dumb. I fully understand that some of these things are probably out of your control.
|
|
|
|
|
Fair enough. Yeah, JSON is used for big data for better or worse, like XML was. There are *some* advantages to a lexical format when it comes to transmitting numbers across platform, but nowadays the binary representations are so standardized anyway that aside from byte order it doesn't matter. But people will do what they will do.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: Does anyone know if GCC will work on Windows without some virtual env like MiniGW installed?
It's probably not the compiler so much as the supplied libc. You could peruse the source for glibc and try to write your own strpbrk() based on that. Though the odd occasion that I've tried to spelunk through the glibc sources, about 50% of the time it becomes a bit like trying to solve a maze in Zork. And, of course, it raises possible GPL issues, so maybe looking at the BSD sources might be a better choice.
Wasn't there a post a few weeks ago bout SSE and string ops? Maybe that's a direction to consider if you want to write your own.
On a related note, it might be interesting to see how a WSL instance compares in performance to a Windows native instance. The results of that might be skewed though, if WSL uses virtual disks, so maybe a better comparison would be Linux and Windows, both in a VM on the same host. At least then both instances would have the same virtual disk drivers, so, presumably, the difference would be down to the strpbrk() implementation.
Keep Calm and Carry On
|
|
|
|
|
i was the one that brought up the simd string processing. i may have to create my own SIMD optimized strpbrk() function for my lib just for the windows build.
Oddly enough - and I don't know if this is still true - but apple's standard libraries and OS calls were heckin fast compared to other major offerings. it was about the only nice thing i could say about them.
I'm pretty sure it's MS's standard libraries that are the problem in this case - specifically strpbrk.
I just wonder why they're not better optimized? I haven't disassembled them yet, but what i've seen of GCCs (which i *have* disassembled) it's using SIMD pretty much entirely. I doubt microsoft's is, based on the performance alone, which should be orders of magnitude faster.
Real programmers use butterflies
|
|
|
|
|
Just a really dumb question. You're sure you're looking at Release build, and not a Debug build? I'm not sure that that would even matter, unless MS provides an unoptimized libc for debugging?
Keep Calm and Carry On
|
|
|
|
|
I've tried building it in release under several different configurations (different architectures and optimizations) and I'm not getting much difference, leading me to believe strpbrk() is not optimized using simd unlike gcc's stdlib implementation
Real programmers use butterflies
|
|
|
|
|
You're not wrong...
Here's some code that scans through a 1GB string (finding a character at teh very end of it) with the four equivalent but different ways I could think of (std::string::find_first_of , std::string_view::find_first_of , std::find_first_of and strpbrk ):
#include <algorithm>
#include <chrono>
#include <cstring>
#include <iostream>
#include <string>
int main()
{
std::string s(size_t(1024) * 1024 * 1024, ' ');
s.back() = 'c';
auto start = std::chrono::steady_clock::now();
auto x = s.find_first_of("abc");
auto end = std::chrono::steady_clock::now();
std::cout << "std::string::find_first_of -> " << x << " in "
<< std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
start = std::chrono::steady_clock::now();
std::string_view s_as_view{s.c_str(), s.size()};
auto x1 = s_as_view.find_first_of("abc");
end = std::chrono::steady_clock::now();
std::cout << "std::string_view::find_first_of -> " << x1 << " in "
<< std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
start = std::chrono::steady_clock::now();
std::string needle{"abc"};
auto x2 = std::distance(std::begin(s), std::find_first_of(std::begin(s), std::end(s),
std::begin(needle), std::end(needle)));
end = std::chrono::steady_clock::now();
std::cout << "std::find_first_of -> " << x2 << " in "
<< std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
start = std::chrono::steady_clock::now();
auto y = std::distance(s.c_str(), strpbrk(s.c_str(), "abc"));
end = std::chrono::steady_clock::now();
std::cout << "strpbrk -> " << y << " in "
<< std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< " ms" << std::endl;
}
and here's the output when compiled with cl.exe -std:c++17 -Ob2 -O2 -Os -EHsc a.cpp and run on the i7-6820HQ in my work laptop:
std::string::find_first_of -> 1073741823 in 552 ms
std::string_view::find_first_of -> 1073741823 in 557 ms
std::find_first_of -> 1073741823 in 2741 ms
strpbrk -> 1073741823 in 2359 ms
That's about 1.8GB/s for the first two, and around 423MB/s for strpbrk . However, when compiled with gcc-10 (with the command g++-10 -o ./a a.cpp -O3 -std=c++17 ) on Ubuntu 18.04 (same laptop - I'm using WSL), I get this:
std::string::find_first_of -> 1073741823 in 3341 ms
std::string_view::find_first_of -> 1073741823 in 3563 ms
std::find_first_of -> 1073741823 in 715 ms
strpbrk -> 1073741823 in 122 ms
That ranges from 300MB/s for the first two to about 8.2GB/s for strpbrk ...
honey the codewitch wrote:
Does anyone know if GCC will work on Windows without some virtual env like MiniGW installed?
MinGW is actually OK - Cygwin is the 'gcc on Windows' that introduces nastiness. As this site says, "MinGW is a port of GCC to Windows. ... It produces standalone Windows executables which may be distributed in any manner." I'd use the distro from that site, or maybe one from this site
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
|
|
|
|
|
Naming things is hard, but really? strpbrk?
string pointer...? b? return? k? Really, what?
|
|
|
|
|
const char * strpbrk ( const char * str1, const char * str2 );
char * strpbrk ( char * str1, const char * str2 );
Locate characters in string
Returns a pointer to the first occurrence in str1 of any of the characters that are part of str2, or a null pointer if there are no matches.
The search does not include the terminating null-characters of either strings, but ends there.
"string pointer break" seems closest. The person that named it was probably drunk at the time.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: "string pointer break" seems closest Hah, amateur! I'd have gone for "spb"
|
|
|
|
|
Knowing the C stdlib it was probably already used for something.
Real programmers use butterflies
|
|
|
|
|
String Pointer BReaK. But you probably knew that. What you may not know is that this goes back to the dawn of Unix on a PDP with actual, real teletypes as I/O devices. Punching the keys on them was hard, so anything that could be abbreviated was. Thus cp, mv and ls rather than copy, move and list. Sure, only 2 chars each (abbrev, again!), but at the end of a day stabbing at the keys, it would make a difference ... if only meant you could pick up that beer without wincing.
Keep Calm and Carry On
|
|
|
|
|
|
Sander Rossel wrote: I thought all that old stuff was abbreviated to save memory.
Sort of. I seem to recall that early linkers had only 8 (or maybe 16) character limit for external identifiers, so that too played a part in the name of system functions.
Keep Calm and Carry On
|
|
|
|
|
I am sure that you are right.
My next question is how much time your typical application spends inside stpbrk(). I can imagine that you can set up testbeds where it exceeds one percent of the total CPU load. That is for a testbed.
Can you set up a true, user level, application solving a true user problem, where more than a single percent of the CPU time is spent inside stpbrk()? At a single percent, doubling the speed of spbrk() might speed up the application by a whooping half percent. Woooah!
Sure: I see that thirty or seventy-five such optimizations together might be significant, taken as a whole. So go ahead with the twenty-nine, or seventy-four, other optimizations. Then serve the pudding.
The proof of the pudding is the pudding you serve to the end user.
|
|
|
|