|
I would allow ignoring potentially invalid documents.
You are not trying to validate the document, but to read it. Applying the robustness rule works here (Be conservative in what you do, be liberal in what you accept from others.)
BTW, thanks for sharing these projects you are working on, they are fascinating.
|
|
|
|
|
The problem is that '{' and '[' introduce different types of entities. '{' should be followed by <propertyname>':' <value> [',' <propertyname>':' <value>]... '}' whereas '[' should be followed by <value>[',' <value> ']' (where 'x' represents a literal x). To disambiguate, you'd have to see if the first two tokens after the '{' | '[' are <propertyname>':' rather than just parsing the first token after '{' | '[' which can be ambiguous (JSON accepts a quoted string as a <value> and as a <propertyname> ).
|
|
|
|
|
My JSON reader (in C# of course) is not recursive, it keeps a Stack of incomplete Objects as it reads.
It Pushes a new Object onto the Stack when it finds a { or [ .
It Pops the top Object off the Stack when it finds a } or ] .
But, looking at the code now, I see that it doesn't verify that they match -- only that they balance.
I wrote it in a hurry.
|
|
|
|
|
Underneath, I use a pull parser, which doesn't keep a stack either - it uses a state machine and forces the user of it to keep track of where they are. However, it recurses when it skips over (partial parsing). I could have made that use a stack but it would have been actually a bit harder to port to my arduino stuff. In the arduino version I use neither - I just keep a depth count, and i don't check if { [ ] } match, only that they balance, like you do.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: forces the user of it to keep track of where they are
Yes, that's at the next lower layer, an iterator which simply returns each piece of the JSON -- { , } , [ , ] , or a named value.
honey the codewitch wrote: it recurses when it skips over (partial parsing)
Mine copies it all into the Object. Then, when the Object is complete, it can "filter out" any parts which the application doesn't want (if requested) before returning it.
|
|
|
|
|
Ah, see, there we have a fundamentally different design.
My C# library has a low level pull based parser. This examines small windows of near infinite size documents. You move through it an element at a time in a loop, like Microsoft's XmlTextReader.** The pull parser is fast. Calling Read() is fast, but calling SkipSubtree() is significantly faster than reading through it because i do a partial parse - just enough to make sure the document is valid - i don't normalize.
In that parse is also ParseSubtree() which takes the section of the document you are on and puts it into an in-memory tree for you, which you can then do stuff like jsonpath filtering and navigation expressions or in place modification on.
Usually, you'll just parse an entire document into a tree and work with that, but for huge documents that's not practical, so you use the pull parser to navigate/skip to where you need to be, and then just load that subset you need rather than the whole document.
In my port to Arduino, I don't have the in memory trees, just the pull parser, but I may add the ability to do small in memory trees later. Everything else is pretty much the same except the functions are camelCase.
** Pull parsers are actually fantastic for embedded stuff because they allow you to process very large documents a bit at a time.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: a low level pull based parser
I'm unsure what you mean by that, but it's likely to be about what I mean. Maybe yours is more generalized, while mine is JSON-specific.
Mine was designed primarily to quickly iterate the members of arrays within some large JSON files (20GB?) so the data contained can be written to SQL Server.
It also reads an element at a time, and builds up an Object as it goes, then passes each complete Object on for processing (writing to SQL Server) individually, so I have only a few Objects in memory at a time (determined by how many threads I'm using). So I don't load the whole file into memory at one time.
So, I may have a file which contains a number of arrays. I tell my utility to Read() the file a piece at a time and "when you find an array named 'Foo', load the Foo table with its contents ; when you find an array named 'Bar', load the Bar table ; etc." and then each requested array has its contents iterated object-by-object by some number of threads, processed, and written to SQL Server. I assume this is similar to your ParseSubtree()
Non-requested arrays and objects don't become full Objects, this is likely similar to your SkipSubtree() except for that it's really just Read() ing until it finds something it was told to look for -- it won't skip a subtree within an object it has been told to read.
Most files I'm reading contain only one big array each. And some JSON files I generate I segment into several parts to ease handling of them.
Filtering out parts of an Object happens after each Object has been fully populated, which allows the process to support more complex filtering requests.
|
|
|
|
|
It's probably easiest just to show you:
jsonReader.begin(file);
while (jsonReader.read()) {
switch (jsonReader.nodeType()) {
case JsonReader2k::Value: Serial.print("Value ");
switch (jsonReader.valueType()) { case JsonReader2k::String: Serial.print("String: ");
jsonReader.undecorate(); Serial.println(jsonReader.value()); break;
case JsonReader2k::Number: Serial.print("Number: ");
Serial.println(jsonReader.numericValue()); break;
case JsonReader2k::Boolean: Serial.print("Boolean: ");
Serial.println(jsonReader.booleanValue()); break;
case JsonReader2k::Null: Serial.print("Null: ");
Serial.println("null"); break;
}
break;
case JsonReader2k::Key: Serial.print("Key ");
Serial.println(jsonReader.value());
break;
case JsonReader2k::Object: Serial.println("Object");
break;
case JsonReader2k::EndObject: Serial.println("End Object");
break;
case JsonReader2k::Array: Serial.println("Array");
break;
case JsonReader2k::EndArray: Serial.println("End Array");
break;
case JsonReader2k::Error: Serial.println("Error!");
break;
}
}
file.close();
}
Which emits this:
Object
Key "backdrop_path"
Value String: /lgTB0XOd4UFixecZgwWrsR69AxY.jpg
Key "created_by"
Array
Object
Key "id"
Value Number: 1233032.00
Key "credit_id"
Value String: 525749f819c29531db09b231
Key "name"
Value String: Matt Nix
Key "profile_path"
Value String: /qvfbD7kc7nU3RklhFZDx9owIyrY.jpg
End Object
End Array
Key "episode_run_time"
Array
Value Number: 45.00
End Array
Key "first_air_date"
Value String: 2007-06-28
Key "genres"
For this:
{
"backdrop_path": "/lgTB0XOd4UFixecZgwWrsR69AxY.jpg",
"created_by": [
{
"id": 1233032,
"credit_id": "525749f819c29531db09b231",
"name": "Matt Nix",
"profile_path": "/qvfbD7kc7nU3RklhFZDx9owIyrY.jpg"
}
],
"episode_run_time": [
45
],
"first_air_date": "2007-06-28",
"genres":
...
It sounds like you're doing something similar except dumping to a database whereas I'm just spewing debug to the serial port for demonstration. (This is arduino C++ code)
Real programmers use butterflies
modified 9-Dec-20 12:58pm.
|
|
|
|
|
Home - /dev/null as a Service[^]
I... what.
What do you get when you cross a joke with a rhetorical question?
The metaphorical solid rear-end expulsions have impacted the metaphorical motorized bladed rotating air movement mechanism.
Do questions with multiple question marks annoy you???
|
|
|
|
|
That puts all of the <X>aaS in their place. Brilliant!
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
I wonder if they take spontaneous job applications? I'm quite adept at that myself...
Anything that is unrelated to elephants is irrelephant Anonymous
- The problem with quotes on the internet is that you can never tell if they're genuine Winston Churchill, 1944
- Never argue with a fool. Onlookers may not be able to tell the difference. Mark Twain
|
|
|
|
|
Johnny J. wrote: I wonder if they take spontaneous job applications? I guess job applications all go straight in the bin!
Johnny J. wrote: I'm quite adept at that myself... rm -r * we've all been there. Well I have!
|
|
|
|
|
5teveH wrote: rm -r * we've all been there. Well I have! |
Strictly for amateurs. A professional uses rm -rf .*
N.B. I don't think this works any longer, but many years ago I had a new-to-unix DBA with root permissions do this. He wanted to get rid of all the hidden directories ... He managed that and a great deal more, besides!
Keep Calm and Carry On
|
|
|
|
|
|
Johnny J. wrote: I wonder if they take spontaneous job applications? Since you haven't done it yet, can it really be spontaneous anymore?
|
|
|
|
|
Maybe they have a service for that.
Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger
|
|
|
|
|
|
It can be challenged in court, but you need the money to do it. I recall reading that there was a patent on saving part of a display before it was overlayed, so that it could later be restored without having to render the overwritten images again.
All of this nonsense is why patents should be eliminated and replaced solely by something much closer to copyright. Patents are also awarded to the first inventor, even if others invented the same thing independently. Nothing but revenue for lawyers.
|
|
|
|
|
Greg Utas wrote: All of this nonsense is why patents should be eliminated and replaced solely by something much closer to copyright. Totally agree.[^].
Greg Utas wrote: Patents are also awarded to the first inventor, even if others invented the same thing independently. Nope. They are usually awarded to the first who goes to the patent office with a formly conform registration.
And at least they should first be awarded when you present a half working prototype, not only for a vague concept, that can later slow the real developement of the idea.
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
Nelek wrote: They are usually awarded to the first who goes to the patent office I had to look it up, and you're right. It used to be as I described in Canada and the US, but they changed their laws (in 1989 and 1998, respectively), probably to align with other countries.
|
|
|
|
|
My (least) favorite is Amazon's patent for "one-click ordering." Software patents are, more often than not, ridiculous. I wonder how many I have violated. I can think of several, most relating to JIT compilation.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
|
|
|
|
|
----------------------------------------- ------------------
| Does a patent for this exist already? |---- No ---> | patent granted |
----------------------------------------- ------------------
| ^
Yes |
| repeat indefinitely
\/ |
---------------------------------------
| Make revisions to the application |
---------------------------------------
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
--Zachris Topelius
Training a telescope on one’s own belly button will only reveal lint. You like that? You go right on staring at it. I prefer looking at galaxies.
-- Sarah Hoyt
|
|
|
|
|
|
Third time the charm I take it.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
--Zachris Topelius
Training a telescope on one’s own belly button will only reveal lint. You like that? You go right on staring at it. I prefer looking at galaxies.
-- Sarah Hoyt
|
|
|
|
|
Yeah, I'd never taken it upon myself to figure out the steps for your Chud. But finally... Thank you, Notepad++!
And thanks for putting it on your board!
|
|
|
|