|
So I'm wrapping this
The quick brown fox jumped over the lazy dog
And I've traversed all the nastiness with Unicode's non-breaking spaces, and invisible breaks and all of that.
I get this:
The quick
brown
fox jumped
over the
lazy dog
Look at the 2nd line.
"Oh just remove the whitespace around the line" they said
"It will be easy" they said
"You can call char.IsWhitespace()" except wait.
This isn't .NET. It's an IoT machine with C++
C++ doesn't do 32-bit codepoints out of the box.
Do have any idea how hard it is to determine if a character is whitespace in Unicode?
I need a massive table.
*headdesk*
To err is human. Fortune favors the monsters.
|
|
|
|
|
I have a simple(r) solution.
Print the character to an image (in-memory, obviously).
Now connect to some OCR service and let it scan the image.
If the result is null, "", " " or "
" it's whitespace.
Just thinking outside the box
|
|
|
|
|
*angeryface*
To err is human. Fortune favors the monsters.
|
|
|
|
|
Hmmm,
Can you go into technical details about the difficulties? I can't figure out why you need 'a massive table'.
|
|
|
|
|
|
|
I guess they match.
"In testa che avete, Signor di Ceprano?"
-- Rigoletto
|
|
|
|
|
The table isn't as big as a thought. A long time ago I wrote something to spit out character class tables, and I thought I remembered the whitespace one being huge. It's not, now that I looked it up.
Still, it's larger than I'd like.
To err is human. Fortune favors the monsters.
|
|
|
|
|
honey the codewitch wrote: Still, it's larger than I'd like. I don't think you need to check for all of them. I think you can get away with just 12.
isWhitespace[^]
Btw, now that .NET is using ICU[^] this should match the C# behavior. (I just checked ICU docs to confirm)
|
|
|
|
|
Yeah, but I'm not using .NET, and my platform doesn't understand unicode beyond wchar_t which I can't even use.
To err is human. Fortune favors the monsters.
|
|
|
|
|
honey the codewitch wrote: Yeah, but I'm not using .NET, and my platform doesn't understand unicode Nobody in this thread thinks you are using .NET
I'm saying that you can write a C function for your IoT device that will duplicate Java and C# whitespace behavior simply by checking for those 12 values. They are all doing the same thing as ICU.
|
|
|
|
|
Ohhhh thanks. Sorry, it was early and I was still a bit slow. I'll do that.
To err is human. Fortune favors the monsters.
|
|
|
|
|
Does the OS you're using not have ICU[^] support built-in?
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Many IoT devices, don't have an OS.
"In testa che avete, Signor di Ceprano?"
-- Rigoletto
|
|
|
|
|
What OS? FreeRTOS? No. And it wouldn't know what to do with truetype anyway. It has no concept of graphics
To err is human. Fortune favors the monsters.
|
|
|
|
|
Message Closed
modified 20-Jun-22 9:31am.
|
|
|
|
|
Mine was accurate, just truncated at the s
To err is human. Fortune favors the monsters.
|
|
|
|
|
No idea, but shouldn't it be "The quick brown fox jumps over the lazy dog"?
|
|
|
|
|
TheRealSteveJudge wrote: jumps
I think you must've solved it!
"If we don't change direction, we'll end up where we're going"
|
|
|
|
|
|
For part of my UTF-8 decoder I use a sparse array, but of course I'm using C# so I can catch Index Exceptions.
I guess I now have to add more characters to it though.
|
|
|
|
|
Fleeced toboggan catches air. (8)
|
|
|
|
|
Fleeced
toboggan S LED
catches (around)
air WIND
SWINDLED
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
It's a good job you're not busy today, otherwise that might have taken you more than two minutes to solve.
|
|
|
|
|
Shouldn't you be busy writing code to strip metadata from JPG files with mogrify[^]?
Correct answer, you are up tomorrow!
|
|
|
|