|
Gonna guess this is the "(almost)". Heh.
|
|
|
|
|
Over a decade ago, I wrote a SQL Server backup/rotation manager. Backups are zipped and password protected using IonicZip, then copied to a local repository on another disk, and optionally pushed out to an FTP resource. This is handy for when I find myself working on the laptop away from the home/office. This system has been used on multiple servers without issues.
A couple of weeks ago, I started using it on a newish Azure VM for one of our latest projects to manage 2 customer databases. It appeared to be working fine...backups/zips/copies all getting created with no errors...or so I thought. Yesterday, I was away from the office and decided to grab the previous day's backups and restore them on my laptop. Using 7-zip, the zips extracted, but with a CRC error detected. Of course, the backups were useless. Native windows zip refused to extract anything, failing with a generic error message. 'unrecognized error'. Every backup from that system was corrupted!
Backups from the other 2 systems are/were fine...one of the other systems is also an Azure VM with practically identical setups and has been working fine for years.
I'll skip the troubleshooting details and get to the fix. IonicZip has a property that I had never heard of before, and which until now had not been important: ParallelDeflateThreshold which needed to be set to -1. Doing so fixed the problem. If I understand correctly, IonicZip has a known problem with files that compress to < 4MB. In my case, mine were slightly over 2MB compressed. All of my other backups are much larger which is perhaps the reason why I've never seen this problem before.
At any rate, I wanted to post this here in case someone else here is using this component and not aware of this issue. The bottom line is, test your backups! Have a great weekend!
"Go forth into the source" - Neal Morse
"Hope is contagious"
|
|
|
|
|
So, just how long were you "puckered"?
I’ve given up trying to be calm. However, I am open to feeling slightly less agitated.
I’m begging you for the benefit of everyone, don’t be STUPID.
|
|
|
|
|
MarkTJohnson wrote: how long were you "puckered"?
I would have been puckered if I had truly needed those backups. As it was, I went through a few emotions:
0: Surprise! Your false sense of security has just been shattered...You are not as clever as you thought you were, and your backups are shite!
1: Doubt...Hmmm what about the other 20 daily backups? Are they all shite?
2: Relief...Whew! The other backups are fine. Just these two from this server are crap.
3: Annoyance. I just want to get on with work. Now I have to log on Azure, allow myself to RDP into that box, get raw unzipped backups, and start troubleshooting the problem.
4: Sleuth Mode...the problem seems to be with the zip lib...maybe a bug...maybe fixed? Go get the latest version to find that it's being deprecated, and the last version is 6 y/o. Whatever, I'll try it.
5: Disappointment. Nope that didn't work, time to open the project and debug with one of the dbs having the issues.
6: Excitement. Yay! I was able to replicate the issue...now on to understanding.
7: Discovery: A well-phrased search put me on the right track...a known issue with an easy fix.
8: Humility: I'm sure I would have discovered this eventually, but I put a lot of faith in an automated process without actually verifying the outputs, which was the only way to detect the problem. Lesson learned!
"Go forth into the source" - Neal Morse
"Hope is contagious"
|
|
|
|
|
... it occurred to me: What if CP itself, or any subsystem employed, has chosen to use the top 8 bits e.g. to flag privileges or other user properties, leaving only 24 bits for the member number?
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
If anyone did it, they did it unsigned!
|
|
|
|
|
This would be very poor database design. If this is actually the case, Chris should demand a refund from the database designer.
In any case, the change in schema should be quite simple, and, assuming that the user ID and privileges are accessed by two separate methods, the change in the interface code should also be small.
For that matter, encoding the various privileges in an integer is also problematic. What happens when the number of privileges grows (as it inevitably will), and exceeds 32?
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
I agree. Giving a single field/value multiple meanings and uses has always led to trouble down the road for me.
There are no solutions, only trade-offs. - Thomas Sowell
A day can really slip by when you're deliberately avoiding what you're supposed to do. - Calvin (Bill Watterson, Calvin & Hobbes)
|
|
|
|
|
Yet it is certainly not uncommon. I guess the most common use is to let the MSB indicate whether the remaining bits is an error code or a valid index. This is frequently used as function return values: A non-negative values is a valid function result, a negative value is an error code.
My gut feeling is that this is more common with functions defined 10 or 20 years ago than with functions defined this year. But lots of old libraries are still used. Also, coding habits die slowly.
I think there is a dividing line between "Giving a single field/value multiple meanings and uses" and "storing multiple distinct fields in a word in order to save space". The compiler can give full support for it, so that the fields addressed by distinct names, are of distinct types and have no overlap. They may be declared e.g. as a byte and as a 24-bit unsigned value. Maybe the original code designers never ever would dream of 16 million not being enough for everyone. Like those who set off a 32 bit value to represent the number of seconds since 01-01-1970 00:00:00. (There is no principal difference between an unplanned 24 bit value overflow and an unplanned 32 bit overflow.)
You may argue that a programmer should always make ample headroom for all values. I have seen programmers doing that, using 64 bit values everywhere, without ever thinking. Non-thinking programmers are no good. Around Y2K there also arose a "2038 panic", and I saw programmers argue in dead earnest that now that we are approaching a 32 bit overflow, to make sure that it doesn't happen again, we should not expand the value to 64 bits but to 128 bits.
I guess that most readers are familiar with the quote from a Xerox Fortran manual, using a DATA statement to give the value of pi as 3.141592653589793, with the additional remark. "This also simplifies modifying the program, should the value of pi change". While that statement is most certainly true, the situation is not very likely to happen.
Making common sense assumptions makes sense even in programming. And even if common sense fails a rare occasions. I could mention quite a few examples of lists of persons, names or IDs where I would never consider the situation that e.g. a database system with a maximum of 16 Mi tuples would have insufficient capacity.
I would be curious to know how may of the 15.9 million CP members have been giving one or more contributions to the form the last 12 months, writing anything anywhere on the site, and who is still a member (not counting spammers who are thrown out). I suspect that the count would suggest that a 24 bit number should be enough for everybody.
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
trønderen wrote: I suspect that the count would suggest that a 24-bit number should be enough for everybody.
The way that user IDs are allocated seems to indicate that they are defined as an "auto-incremented" integer. This means that user IDs are never reused. So, while there are less than 16M users on the site, the user ID values will grow without bound.
I expect that message IDs are allocated in a similar way (presumably with a 64-bit ID).
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Daniel Pfeffer wrote: The way that user IDs are allocated seems to indicate that they are defined as an "auto-incremented" integer. This means that user IDs are never reused. Oh, sure. But if there never are more than, say, 200 active users, it seems a little overkill to use a 64 bit member user ID for the purpose of preventing overflow. Similar to increasing the variable holding the second count since 01011970 to an 128 bit value "to be on the safe side". That is a "... should the value of pi change" kind of rationale.
To make one thing very clear: There is nothing wrong about 64 bit values - certainly not if you have got unlimited space and unlimited processing capacity available, and most certainly not for one 64 bit value. If the cost of using 64 bit values is zero or practically zero, then you may of course use 64 bit, even to store bools. Then you can simply ignore space considerations. You need to care for space requirements only if you do not have unlimited space.
Whenever a (hard or soft) limit is broken is a time to sit down to consider, whether an 8-bit, 16-bit, 32-bit or some other limit. Don't forget that 64 bits is not 32 bits more than 32 bits, it is 11 orders of magnitude more! 32 bits is 5 orders of magnitude more than 16 bits, not 16 bits more. "A billion here, a billion there - you know, pretty soon it grows into real money" ... Whether he actually phrased it that way or not (never trust quotes to be exact!), the message is clear: Keep the magnitudes straight. When someone states something like "It was either millions or billions, I am not sure", you shake your head, even though that is only three orders of magnitude. So shouldn't raising the upper limit by five, or even eleven orders of magnitude be handled as something rather significant?
Raymond Chen's last blog entry, posted yesterday, addresses 8 bit counters: Why does GlobalLock max out at 255 locks?[^]. Note his final paragraph: "This hasn’t caused any problems for 30 years, so I think we dodged a bullet there."
(And for those unfamiliar with "The New Old Thing": That is among the most readworthy, and enjoyable, IT blogs in the entire internet. Bookmark it!)
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
trønderen wrote: But if there never are more than, say, 200 active users, it seems a little overkill to use a 64 bit member user ID for the purpose of preventing overflow.
I agree with you regarding appropriate choice of value sizes.
The number of users is unlikely to exceed 232 (4 billion - half the population of the planet!), so I agree than a 32-bit autoincremented value is sufficient for that. However, the number of messages has likely already exceeded 232 - this would require that the average number of messages per user be only 256.
trønderen wrote: "The New Old Thing": That is among the most readworthy, and enjoyable, IT blogs in the entire internet.
Agreed. I've learnt quite a bit, both about Windows internals and about general programming from this blog.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Some small part of such things definitely manifests in places where DBAs rule with iron fists.
It's simply 'worth' the tradeoff of uhm... 'repurposing' a field than dealing with bureaucratic sadists.
|
|
|
|
|
If the number of privileges exceeds 32, then just use a bigint!
The difficult we do right away...
...the impossible takes slightly longer.
|
|
|
|
|
Almost all embedded MCUs are little endian.
Almost all display controllers that can connect to them are big endian.
My graphics library builds pixels in big endian order on little endian machines as a consequence. It's just more efficient.
LVGL is another embedded graphics library - one I contribute to - and they removed a feature on version 9+ where you could set it to swap the bytes on a 16 bit color value. This is to compensate for the endian issues.
Not swapping during the draw operation means you need to scan and rearrange the transfer buffer before you send it to the display.
evoid lcd_flush_display( lv_display_t *disp, const lv_area_t *area, uint8_t * px_map) {
size_t count = (area->x2-area->x1+1)*(area->y2-area->y1+1);
for(int i = 0;i<count;++i) {
uint16_t* p = &((uint16_t*)px_map)[i];
*p = ((*p<<8)&0xFF00)|((*p>>8)&0xFF);
}
esp_lcd_panel_draw_bitmap(lcd_handle,area->x1,area->y1,area->x2+1,area->y2+1,px_map);
LV_UNUSED(disp);
}
This is less efficient. It's also ugly. This is what's for dinner now in LVGL. This is "progress".
The worst part is I understand and even sort of agree with why they did it.
The issue is multiple displays, and the fact that some displays may not need the swap, perhaps because they're monochrome or something. Previously prior to 8 LVGL simply didn't support that scenario because the swap option was a #define (LVGL is C, not C++) and it applied to all displays as a consequence.
But to remove it entirely seems like it was a decision guided by expediency more than anything. It's unfortunate.
And all of it reminds me of why I hate the fact that humans couldn't universally agree on endian order.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
modified 1-Jun-24 13:36pm.
|
|
|
|
|
One little two little three lit....
|
|
|
|
|
My wife is little-endian...
|
|
|
|
|
So which way would you want it?
The little endian way we write email addresses and domain names? Or the big endian way we write IP addresses?
The little endian way we write snail mail addresses on an envelope? Or the big endian way we dial a phone number?
The little endian way we sign a document with our full name? Or the big endian way we are listed in the telephone book? (Iceland is an exception - the phone book is little endian!)
The big endian way we write multi-digit Arabic numerals? Or the big endian (?) way Arabs write multi-digit numerals? They reading right-to-left, so to them, it is little endian.
The big endian way Americans write month/date, or the little endian way they write month/date - year?
The little endian way street addresses are written by American standards (17 Main street) or the big endian way used e.g. in Norway (Storgata 17)?
The big endian way when adding an entrance (17 Main Street, Entrance B) or stick to a consistent little endian way (Entrance B 17 Main Street)?
Four-and-twenty blackbirds baked in a pie, or Twenty four blackbirds baked in a pie?
A quarter past nine, or nine fifteen?
If you want the entire world to agree on a single endianness, you probably have to work yourself up to a position of an almighty ruler of the world. Even if you got in a position where you could turn the other way IP addresses, and put the area code at the end of the phone number, and reorder the phone book on first names rather than last, and teach schoolkids to write the tens after the ones and then the hundreds and the thousands, and making all Americans write the date before the month, and ... What would you do with IBM mainframes? With systems based on MC68, 8051 or OpenRISC, older Power and SPARC systems? Would you make them illegal?
There are some small, insignificant embedded (and other) architectures that allow memory access in either endianness, such as ARM, newer Power or SPARC, Alpha, MIPS, i860 and PA-RISC. You may consider them all to be so unimportant that you will ignore them. You may also find unimportant CPU instructions for reversing the byte order in halfwords, words or doublewords, provided by several architectures of one (or one preferred) endianness.
I guess you will be following up here at WeirdAndWonderful/Feature/com.codeproject.www//:http
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
Frankly, little endian everything, if I had my way, but it's too late. The internet is big endian for example.
I don't know what you're talking about in terms of writing email addresses and domain names.
I'm talking about byte order of machine words. That's all.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
honey the codewitch wrote: I don't know what you're talking about in terms of writing email addresses and domain names. What goes first, to the left: The smallest unit, i.e. the individual recipient, or the larger unit comprising a huge number of them, the mail server name?
Which goes first: The smaller subdomain 'codeproject' or the TLD 'com'?
Are you really sure that the internet is consistently big endian? IP addresses certainly are, but is that all that there is? (Then let's keep dancing ...)
Curious memory: The very first map I saw of the internet, the nodes were labeled with a number, mostly 3-digit. Several of the nodes had the same number. I had to have these labels explained to me: It was IBM 360 mainframe model provided at that site as the main computing resource. IBM mainframes always were big endian. That might explain why IP addresses were selected to be big endian.
If you go for consistent little endian format: Do you consider the last character in a name to be the most significant? If memory contains the bytes, at increasing addresses, 'J', 'O', 'H', 'N', would you then consider 'J' the least significant one, e.g. if you were to sort a list of names? Would you rather choose to store 'JOHN' as 'N', 'H', 'O', 'J'?
Would you then require all names (or other strings) to be of a fixed length, with the characters to be compared for sorting at the same offset from the start of the string? Or would you construct a descriptor for each string, each with an index (offset) starting at the last character, decrementing it as the sorting progressed to less significant characters?
Or would you store numeric values as little endian, but strings as big endian?
Bottom line: It isn't as simple as 'Choose one and always use that'. You can't define yourself away from endianness problems, even if you have omnipotent power. Unless, of course, if your power is so omnipotent that you can define two plus two to make something else than four.
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
Generally, the Internet is big endian, as in any binary protocols that exist on the internet expect "Network Byte Order" which is big endian. I'm happy to be proven wrong on that score, but it applies to everything I can think of, be it 32 or 64 bit IP addresses, 16 bit port numbers, NTP, etc.
I generally sort asciibetically if I'm sorting with a machine. I don't care about text in this instance.
Like I said, I was only referring to machine word byte order.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
Sure, we could select one tiny little speck of the endianness problem - byte ordering, say - and ignore the rest. Assuming that it was possible to select one single byte ordering and rule out any others, we could declare: 'Hooray! Now the world is free of endianness problems!'
Maybe that would be the case inside your little IDE. At least until you have to handle a date. Or an IP address. Or a decimal multi-digit number.
You may argue: But those are not endianness problems! They is something different from the ordering of bytes within a word!
I'll accept what you say, and you have the full right to say that nothing else than byte ordering falls in under 'endianness'. But the issues are about ordering smallest unit to biggest or biggest to smallest. We could use a different term for this wider problem area, e.g. 'big ordering' or 'small ordering' (and 'mixed ordering'), making 'endianness' a subset of 'ordering'.
In principle, we could then throw out all IBMs, Powers, MC68s, OpenRISC and a number of others, as they are more or less bound to the forbidden endianness. That would leave the tiny 'endianness' speck of the ordering problem area 'solved' (and a few manufacturers would get rid of some nasty competitors). But the rest of the 'ordering' problem domain would remain unsolved. I understand that you don't care about the rest. That is OK with me, as long as you accept that today is 2/6, 2024.
Religious freedom is the freedom to say that two plus two make five.
|
|
|
|
|
trønderen wrote: That is OK with me, as long as you accept that today is 2/6, 2024.
Heh. As an American, that actually took some getting used to when I'd date my articles here. We do things differently here. Dumber. See our football. Our "cheese". etc.
Check out my IoT graphics library here:
https://honeythecodewitch.com/gfx
And my IoT UI/User Experience library here:
https://honeythecodewitch.com/uix
|
|
|
|
|
I like this approach: Let's decide, whether the whole world is little or big endian.
Then we can apply this decision in the computer world.
By the way, what programming language was used to write the whole world? I think this was C.
|
|
|
|
|
As the Universe is still expanding, going from small to big, I think the answer is obvious. At least on the scale of the Universe.
|
|
|
|
|