|
Of course, but the built-in methods of System.Char (.net) don't suit my needs, so I rolled my own as is my wont.
I actually have a number of places where I have to roll my own character (or byte) classing solution to meet my requirements.
The "delimiter" class in particular depends on the type of file being read -- for JSON, the delimiters are { } [ ] : , ; for CSV they are , \r \n .
|
|
|
|
|
I was talking about the method, not the functions themselves. The basic idea is that you can assign up to 8 distinct classes to each character. Fleshing out a bit more my idea, something like:
tab['{'] = CLASS_DELIM_JSON;
tab[','] = CLASS_DELIM_JSON | CLASS_DELIM_CSV; The table would of course be statically allocated and initialized:
bool is_a (int x, char what)
{
static char tab[128] = { };
return (tab[(char)x] & what) != 0;
}
Mircea
|
|
|
|
|
Yes, and that's what I'm doing here -- but C#ishly, using an enum to define the constants and the map.
|
|
|
|
|
PIEBALDconsult wrote: BUT once in a while we receive a corrupt file...but firing a million times in rapid succession is decidedly not
Why does it continue after the first one? If firing once means it is corrupt and then you expect the rest to be corrupt then why not stop?
Or say stop after 10 or so?
PIEBALDconsult wrote: to add a test for i > 127 and avoid the exception (which I am loathe to do on principle).
Not sure I understand that statement. Unicode does not fit that range and you are getting unicode. So testing for it rather than assuming it is correct is what should be happening.
|
|
|
|
|
The reader has no clue whether what it's reading is corrupt or not, that's determined at a higher level. The reader's job is to simply read the characters and return them.
jschell wrote: Or say stop after 10 or so?
That's the issue I was running into, I had it throw an Exception after ten seconds, I needed to keep that from happening.
jschell wrote: you are getting unicode
Yes, but the reader was taking too long to read the UNICODE characters which are outside the ASCII range. It now reads them almost as quickly as UNICODE characters which are within the ASCII range.
But to do so, It's flipping between two implementations of a method -- one for ASCII and another for non-ASCII -- depending on which characters it encounters.
|
|
|
|
|
PIEBALDconsult wrote: what it's reading is corrupt or not, that's determined at a higher level.
That seems idealistic or odd. If it is or was throwing an exception then it seems like it did know it was corrupted.
Obviously the caller continued since you said it was failing a million times. Was the caller getting usable information? If so then the architecture indicates that it is not in general corrupted. But rather that some of the data is not useable so that is how it should be treated.
Conversely, to me, if I was seeing a million actual failures then I would question the need to attempt to retrieve 'valid' data. Corrupted generally would refer to some random process. That random process might create a character that passes your checks (regardless of how you check) but still represent bad data. Matter of fact the more data like this that exists the more likely that becomes. So "millions" would suggest that some other bad data would end up being accepted as good.
|
|
|
|
|
jschell wrote: seems like it did know it was corrupted.
No, the code had a bug whereby it "took too long" to read a value/token. I have fixed the bug so now the value/token gets read and returned and then the caller can determine whether or not the value is reasonable.
jschell wrote: failing a million times
No, not failing a million times, failing once.
jschell wrote: the architecture indicates that it is not in general corrupted
In general, yes, non-ASCII characters cannot be considered invalid by the reader, only the caller can determine that.
A large text value may contain a few non-ASCII characters, such as "smart quotes", and that's fine. The issue was that when a value which contains many non-ASCII characters, it took too long to read -- it also happens that the value was corrupt in this case.
jschell wrote: Corrupted generally would refer to some random process.
Yes, and we still don't know what is causing the corruption.
jschell wrote: random process might create a character that passes your checks (regardless of how you check) but still represent bad data
As is always the case, and again the reader can't determine that.
jschell wrote: some other bad data would end up being accepted as good.
As is always the case, and again the reader can't determine that.
In this particular case of corruption, the process will fail when it tries to stuff a million-plus characters into a database column which allows only two-hundred. But yes, even now there may be a file on the way which is corrupt in such a way that the bad value fits and the load won't fail. The reader won't care, the loader won't care, but some other part of the process will (probably) freak out.
|
|
|
|
|
PIEBALDconsult wrote: No, not failing a million times, failing once.
You did say in the OP the following which seemed to me to suggest that it would have failed a million times.
"Apparently having the catch fire occasionally is OK, but firing a million times in rapid succession is decidedly not."
PIEBALDconsult wrote: In general, yes, non-ASCII characters cannot be considered invalid by the reader, only the caller can determine that
Back to being too idealistic to me in terms of your original point of how the code should handle it.
The method is only part of this process. It is not part of a multi-use library (presumably.) And the problem being solved does involve characters that are not ascii. So nothing wrong with this method specifically dealing with that by using the 'if' solution.
You could, idealistically, make the caller deal with it before calling the method in the first place. To me even spending time on that consideration seems like overkill. It would not be worth my time for consideration.
|
|
|
|
|
Having the catch fire is not a failure.
jschell wrote: It is not part of a multi-use library
Yes, it is, of course it is.
jschell wrote: deal with it before calling the method
Uh, what? You can't handle characters before they've been read.
|
|
|
|
|
I think Chris will have to update the Lounge rules - no posts about what ChatGTP says!
Me: Write me a poem about programming
Chat GPT:
Programming, so complex and so hard
It can make your head spin, like a spinning top
It can be difficult to understand
But once you do, it can never be topped
From the basics of HTML to the depths of Java
Programming is a skill that's hard to master
But with enough dedication and hard work
You can be a coding master
The possibilities are endless with programming
You can create something that's truly unique
From a simple website to a complex game
You can make something that's truly one-of-a-kind
It can be frustrating and time-consuming
But the rewards can be great
You can make something that others can use
And that's a feeling that can't be beat
So don't give up when it gets tough
And don't let the obstacles stand in your way
With enough dedication and hard work
You too can be a coding master one day!
|
|
|
|
|
It bothers me that this would work as a reasonably structured song if you made a chorus about being a coding master.
Thanks,
Sean Ewington
CodeProject
|
|
|
|
|
Sean Ewington wrote: this would work as a reasonably structured song if you made a chorus about being a coding master.
Reminds me of the song: Juke Box Hero - YouTube[^]
|
|
|
|
|
One of my ultimate bands and songs!
|
|
|
|
|
Not anywhere near as good as this :
DR. SEUSS EXPLAINS COMPUTERS
(Original author unknown) Extracted from A Grandchild's Guide to Using Grandpa's Computer by Gene Zeigler
If a packet hits a pocket on a socket on a port,
And the bus is interrupted as a very last resort,
And the address of the memory makes your floppy disc abort,
Then the socket packet pocket has an error to report.
If your cursor finds a menu item followed by a dash,
And the double-clicking icon puts your window in the trash,
And your data is corrupted 'cause the index does not flash,
Then your situation’s hopeless and your system’s gonna crash!
If the label on the cable on the table at your house,
Says the network is connected to the button on your mouse,
But your packets want to tunnel on another protocol,
That’s repeatedly rejected by the printer down the hall,
And your screen is all distorted by the side effects of gauss,
So your icons in the window are as wavy as a souse,
Then you may as well reboot and go out with a bang,
'cause as sure as I’m a poet, the suckers gonna hang.
When the copy of your floppy’s getting sloppy on the disc,
And the micro code instructions cause unnecessary risk,
Then you have to flash your memory, and you’ll want to RAM your ROM…
Quickly turn off your computer and go and tell your Mom.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
We are no longer allowed to appreciate Dr. Seuss in America as he has been deemed a racist. Just by reading this post I could technically, literally, and figuratively be canceled.
|
|
|
|
|
That is entirely USians' loss.
I followed the controversy as it developed in the US. All I can say is that if an author can be deemed racist for a couple of elements in one drawing out of N illustrated books, then something is very wrong in the US. I remember reading the controversial book (And to think I saw it on Mulberry Street, IIRC) as a child, and all I thought about the drawings of "foreigners" was that they certainly dress funnily in "foreign parts".
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
I'm a redneck American through and through. My blood bleeds red, white, and blue. That means I can talk about it.
100% agree with you there on it being our loss. The US is in a state of decline and, unless something drastic changes, our course is headed towards being nothing more than a story of what used to be. Which is very, very sad to see happen. But it's also the reality of the situation.
Can't really say much else without getting in the weeds... but the US is hurting as we watch it be intentionally destroyed from the inside out.
Jeremy Falcon
|
|
|
|
|
Nice lyrics indeed ... I was not that lucky to talk to ChatGPT, as he usually greets me with "I am busy, come back later ..."
|
|
|
|
|
Sucks, Keep your day job.
|
|
|
|
|
Roses are #ff0000,
Violets are blue,
zeros and ones,
but never a two
|
|
|
|
|
Taking a leaf from JavaScript's clarity of definition for equality, in Span<T>.Inequality(Span<T>, Span<T>) Operator (System) | Microsoft Learn , the following definition of equality seems a bit suspect...
"Two Span<t> objects are equal if they have different lengths or if the corresponding elements of left and right do not point to the same memory."
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
Alan Kay.
|
|
|
|
|
Someone forgot to add "not" when they copied the remarks from the equality operator.
Two Span<T> objects are equal if they have the same length and the corresponding elements of left and right point to the same memory.
IIRC, there used to be a way to report such problems directly from the page, but it looks like they've abandoned that idea.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Managed to find the option to edit the page by creating a pull request on GitHub[^]. The documentation should be correctly shortly.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
I opened up my laptop and noticed that it was responding super slowly.
It's AMD R7 8-core (4 core with threading-tech) with 16GB RAM running Win10 and it always performs very well.
Internet?? or was it the laptop itself??
8 Cores All At Same Level of Utilization? Unlikely!
I checked the CPU Utilization and saw this (snapshot)[^].
Isn't that interesting?? I've never seen all 8 cores report that they are exactly utilized the same.
I closed CPU monitor and opened it again and saw the same thing.
After Reboot: Looks Normal Again
Here's a snapshot that seems more realistic[^].
Then I Noticed...
Hey, check out the first graphic and you'll see that the Speed was at .38GHz -- even though these chips are really at 2.0GHz
Not sure what happened there, so we are filing it under Weird!!
modified 25-Jan-23 14:37pm.
|
|
|
|
|
Fake processor!
|
|
|
|
|