|
Ctrl+A is faster though, for the last one. The first two are useful indeed.
|
|
|
|
|
but ctrl-A select everything, ctrl-shift-space will select the current table/ pivot table. I use this a lot!
|
|
|
|
|
You all know those cats will eat you if they get the chance
|
|
|
|
|
My cat, Bear, uses the computer for an entirely different purpose. He loves a bit of Death Metal: here he is checking out some Obituary[^].
|
|
|
|
|
Edit: To be clear, the reason this stuff below makes me happy is not because I want to be the best at something, but because one of the goals of my project was to rely mostly on cross platform features and downgrade gracefully. The only platform specific magic i use is mem mapped files, for which my code supports windows and linux currently. I set out to prove that you could largely rely on the stdlib and just design algorithmically better ways to process JSON and I feel I did that. I also think that given the portability and memory usage of my library, it pays for itself even in areas where it slightly underperforms simdjson in raw speed. certain cases - many in fact - will see simdjson slay my library, particulary with random access, which mine can't even do, but for the scenario i designed it for, i designed it to be competitive with fast offerings and it exceeded my expectations so far. I need more benchmarking though
I'm looking at possibly integrating my JSON(C++) bulk loader with simdjson. I've been talking with some of the contributors, but we've been running into a wall because of their reliance on upfront indexing of the entire document.
I just ran my first benchmark of the two performing the same operation - retrieving the name field off of a 20MB json object.
simdjson took 46.231 ms to find Burn Notice
JSON(C++) took 0.055 ms to find Burn Notice
The reason mine is so much faster in this case is I don't index anything. That's a linear search.
I need to run a full suite of benchmarks but the above doesn't surprise me. Simdjson will outperform my library if you need most of the data in the document. If you don't, then mine may win out.
simdjson claims to be the fastest json processor in the world.
the above wobbles that claim, even if it's not exactly a real world benchmark
Simdjson runs on 64bit only.
Mine runs on 8bit all the way up to 64 (and beyond)
Mine is a lot smaller too.
I'm not usually competitive and I didn't design this to beat simdjson, but it's nevertheless satisfying that i'm getting results like this.
Edit: The more benchmarks i create, the happier I get.
Benchmark presently, extracting 20000 episodes from the document.
JSONPath equiv would be $..episodes[*].season_number,episode_number,name
i had to use a different path for simdjson because i couldn't figure out how to use a different search axis:
JSONPath for simdjson was $.seasons[*].episodes[*].season_number,episode_number,name
simdjson took 49.669 ms to extract episode data
JSON(C++) took 55.921 ms to extract episode data
That's the second benchmark, which i'm actually thrilled by. The first one just gets the "status" field off the root object - it's toward the end of the document:
simdjson took 45.685 ms to find Canceled
JSON(C++) took 30.12 ms to find Canceled
Each time I'm closing and reopening and doing a fresh read. The reason being is that my stuff was designed for forward only bulk loading - you get everything you need in a single pass. period. it just wasn't designed to work otherwise. so i wanted to compare apples to apples.
simdjson took over 100MB of RAM to get those numbers, and generally requires 4-5x the amount of RAM as the document size on disk.
Mine did it with under 1kB not counting platform specific incidentals like the mapped vmem page(s) i'm using.
Real programmers use butterflies
modified 3-Jan-21 22:55pm.
|
|
|
|
|
It appears to me that simdjson and JSON(C++) are optimized for different use cases - speed vs memory usage. It does not surprise me that optimizing for memory usage can also help the speed, especially when taking cache effects into account...
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
They absolutely are optimized for different use cases, which is one of the reasons for the integration discussions i'm having with some of the simdjson contributors.
Real programmers use butterflies
|
|
|
|
|
Hmmm,
I am not sure why you would make this claim in the codeproject forum. If you really believe that you have built a better parser then wire your library up to one of the popular JSON benchmarks and file an issue into the simdjson list... boldly making your claim public. Thousands of people are monitoring that list and you will either get a quick confirmation or fierce rebuttal.
Here is my take:
You are most likely testing against the fully-validating DOM parsing version of simdjson. Check to see if you are faster than the ondemand::[^] namespace. After reviewing your code I can say that the ondemand::parser version of simdjson is closer to your implementation.
For what it's worth... I do think your parser will be slightly faster. But keep in mind your lib has no fuzzing/testing and is non-validating.
Best Wishes,
-David Delaune
|
|
|
|
|
A) make what claim, specifically? That my project met its goals? I stand by that.
B) I never said it was better than simdjson. I said it was competitive. I stand by that. I did mention algorithimically better ways to process json, but i'm talking about designing better software generally there, not designing a better processor than simdjson, and I think it's clear from the context. But that's the only time i use the word "better" in my post.
C) I haven't said anything here that I haven't said to the simdjson contributors themselves, by way of an #issue, and pull request, and discussion.
D) I am only testing against the ondemand api
void getEpisodeData(ondemand::object& root) {
auto seasons = root["seasons"].get_array();
auto sit = seasons.begin();
while(sit!=seasons.end()) {
auto season = *sit;
auto episodes = season["episodes"].get_array();
auto eit = episodes.begin();
while(eit!=episodes.end()) {
auto episode = *eit;
auto season_number = episode["season_number"].get_int64();
auto episode_number = episode["episode_number"].get_int64();
auto name = episode["name"];
++eit;
}
++sit;
}
}
void extractEpisodes(LexSource &fls)
{
StaticMemoryPool<256> pool;
JsonElement seasonNumber;
JsonElement episodeNumber;
JsonElement name;
const char *fields[] = {"season_number", "episode_number", "name"};
JsonExtractor children[] = {JsonExtractor(&seasonNumber), JsonExtractor(&episodeNumber), JsonExtractor(&name)};
JsonExtractor extraction(fields, 3, children);
JsonReader jr(fls);
unsigned long long maxUsedPool = 0;
int episodes = 0;
while (jr.skipToFieldValue("episodes", JsonReader::Forward)) {
if (!jr.read())
break;
while (!jr.hasError() && JsonReader::EndArray != jr.nodeType()) {
if (pool.used() > maxUsedPool)
maxUsedPool = pool.used();
pool.freeAll();
++episodes;
if (!jr.extract(pool, extraction)) {
printf("\t\t%d. (extraction failed!)\r\n",episodes);
break;
}
}
}
if (jr.hasError())
{
printf("\tError (%d): %s\r\n",(int)jr.error(),jr.value());
return;
}
}
The reason mine is competitive is simple. As you say it does not validate, or rather, it can, but the way I'm using it, it is not. But also it uses a lot less instructions per operation, generally, and examines each character less times than simdjson (usually). Currently my code is harder to use, but that will change once I finish building JSONPath on top of my query system.
Real programmers use butterflies
modified 4-Jan-21 8:10am.
|
|
|
|
|
|
As Daniel Lemire initially suggested, I started a pull request.
added jsoncpp folder for comparison and possible adaptation/integration by codewitch-honey-crisis · Pull Request #1367 · simdjson/simdjson · GitHub[^]
I did this in order to continue the discussion. That way others could access my code. I haven't integrated in that request, because our apis are so different, and mine needs a feature add before I start anyway.
Anyway, it's all under the jsoncpp folder, including the very beginnings of a comparison benchmark in main.cpp under that folder, and a small build/run script for linux as ./run.sh
I'll add more comparisons once i figure out how to use ondemand to search across each different axis (forward, descendants, siblings) - right now i can only get it to do siblings, but it's not fair to make it walk the tree in the code if it doesn't have to, when mine doesn't. Nor do i want to write skewed benchmarks just because i am still learning the api.
Real programmers use butterflies
|
|
|
|
|
yeah,
It will be interesting to see how your lib performs against the others. I added a reminder in my calendar to check up on it next week.
|
|
|
|
|
Eventually I found out that there's an odd compiler switch "-march=native" that makes simdjson scream.
Once I enabled that, it was basically like simdjson vs rapidjson.
I had everyone going there for a little while though, including myself.
Nobody caught the compiler settings until way late in the game. I've never used that switch myself.
I'm still working with simdjson because even after all that, my code is still apparently very fast when compared to their competitors, according to them anyway.
And they still want streaming support, my code is apparently interesting enough that they haven't kicked me off their github yet.
I'm still happy with the results. I haven't spent a year optimizing my code and I'm keeping up with their competitors.
And I can still make it faster.
Real programmers use butterflies
|
|
|
|
|
The -march=native option tells GCC that you want to optimize for the machine you are compiling it on. I never even thought to question your compiler/linker options. I just assumed that you were experienced enough to have that perfected. Come to think of it... I don't even remember seeing your Makefile let me look to see if I have it.
Update:
Looks like you were using a Makefile loaded from your environment. You never distributed it.
|
|
|
|
|
Actually I was just using a small shell script that just ran the compiler, and then the program.
I posted it on the site along with my benchmarks.
I'm used to msvc, still newish to gcc and g++.
I thought -m64 covered it for my machine.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: I thought -m64 covered it for my machine. No, that would mean it was generating x86-64 code from circa 1999. In fact you were probably running simdjson without any of the modern simd instructions.
Don't beat yourself up, your lib has great benchmark numbers.
|
|
|
|
|
Yeah I eventually found that out. My benchmarks are now where i expected them to begin with, competitive with most offerings. It was when i started getting those numbers against simdjson that i was giddy.
my first goal was ram use, because extreme portability. my third goal after those two was speed.
what i'd like to do is figure out how to get simdjson's stage 1 analysis to stream like my lib does, even at the cost of some branch misprediction.
but before when I thought i was neck and neck i thought simdjson's stage 1 was the wrong answer for bulkloading. it still might be, if i can't stream it and get good performance.
Real programmers use butterflies
|
|
|
|
|
Good that I have none at the moment. In the last two weeks I made some progress with the Zwölf and now all bits and pieces are falling into place.
I wanted to boot the Zwölf with a microcontroller, but was not able to get one anywhere. Now what?
Why not take the simple breadboard Zwölf and add an IDE controller? Then I only need some interface and I get a smart disk controller.
What kind of interface? Serial? Parallel? Both would be kindof slow. I need something faster. Something that can be used for bootloading as well as at runtime. DMA in both directions, that's it.
The CDP1802 never gives up control of its bus, not even for DMA. It keeps running and inserts a DMA bus cycle when DMA is requested. It even acts as a DMA controller and does the memory addressing itself, leaving only putting its data on the bus at the right time to the requesting device. Super easy, only three cheap logic ICs on each processor board. Even a 'you've got mail' interrupt is included to be used when all bytes of a request have been sent. The coolest part is that the processor can even do this kind of DMA when it's not running. The interface can be used to load code into memory before booting. That's going to be a smart disk controller.
Coming to think of it, why not even add some sort of network adapter. A smart disk and network adapter. Let's boot the Zwölf from the internet.
The software side is also coming along. I'm going to be stuck with bit banged software RS232 for a while, so I have written a little library for VT100 terminals and pimped the software from 9600 baud to 19200 baud. No comparison to the VT52 and 300 baud we had in the old days. Just fill the screen when you wanted to take a break.
And I now have a heap. Yes, that's memory management. That may be an alien and revolutionary concept, but there is no hand waving when you start with an empty computer. I even had to use something like a global variable, because it's hard to allocate memory on the heap when the computer does not remember where it put the heap.
I have lived with several Zen masters - all of them were cats.
His last invention was an evil Lasagna. It didn't kill anyone, and it actually tasted pretty good.
|
|
|
|
|
What MCU are you looking for? I can't imagine nothing being available.
I'm not sure how many cookies it makes to be happy, but so far it's not 27.
JaxCoder.com
|
|
|
|
|
I would have to dig through countless datasheets, but I think it was one of the PIC16F18000 series.
I have lived with several Zen masters - all of them were cats.
His last invention was an evil Lasagna. It didn't kill anyone, and it actually tasted pretty good.
|
|
|
|
|
I'm looking at Mouser's site, at a through-hole PIC16F18325-E/JQ with 14k of flash and 1k of ram.
They're listed at $1.62 ea, packed in a tube. (and not delivered)
A few months ago, I bought 5 BluePill boards. They've got an STM32F103C8T6 onboard that's got 64k of flash, 20k of ram and run at 72mhz. They turned up on my door for $13.66
They're a 32 bit Arm Cortex-M3. Wish I'd bought em years ago.
2/5/10PCS STM32F103C8T6 ARM 32 Cortex-M3 Minimum System Development Board Module | eBay
EDIT: Honey has been using some ESP32 boards. They're cheap, faaast and have large memories. Built-in wireless too, if that's important. You're talking about 600 mhz, a meg of memory and still under half a dozen dollars. Can be prone to overheating.
modified 3-Jan-21 22:42pm.
|
|
|
|
|
The ESP32s have 520kB of ram, some reserved, but expandable via SPI PSRAM.
They run at CPU clock of 160Mhz 240Mhz, but SPI clock locked to 40Mhz for the external, although it's possible to get a bit faster than that with 4-bit SD readers in SD MMC mode because the ESP32 is aware of them. SPI RAM also has special provisions to make it faster (80Mhz I think?) That's my biggest bottleneck there - SD and display over SPI. It makes me want to get out and push. JSON(C++) runs at about 74kBps on my little esp32. i *might* be able to speed it up with buffered reads, but it didn't help on by desktop so i took that feature out, so it's possible that you might get better results with fread() than fgetc() - my pc didn't (surprising, i know). Anyway, SPI i/o is not great, so it's a chip to avoid if you want to do say, full speed writes on microSD cards.
They're immensely connectable - bluetooth, wifi, OTA (though bluetooth libs and OTA are each huge so you often have to pick one), and ESP-NOW radio comms. You really have nothing you can't interface with.
The arduino framework is somewhat limiting though. Once I broke out of it and started using FreeRTOS and the espidf directly it opened some doors. multithreading and taking advantage of the second core for example. it's possible to do under the arduino framework but none of the libs are thread safe so why bother?
I think I want to move to the arm platform for my next IoT widgets.
I will probably always love these little chips though.
Real programmers use butterflies
modified 4-Jan-21 10:04am.
|
|
|
|
|
|
I was thinking about getting some of those. I just don't have a use for it yet.
Oh and I said 160Mhz but I should have said 240Mhz. For some reason 160Mhz came up on the ESP forums and now I can't remember why but it stuck in my head.
Real programmers use butterflies
|
|
|
|
|
I don't s'pose you've tried one of the ESP32Cam packages around have you?
I've been thinking about slapping one into some flying r/c models and am curious about any gotchas.
|
|
|
|
|