|
Eddy Vluggen wrote: And no, you don't go back questioning the design of the screws if you're building a car. You take the industry standard, take a brief glance at other screws, and try realize there's a reason why it is the current standard. That is certaily true. Sometimes there are reasons for that component design that you do not realize, and if you try to "improve" it, you may be doing the opposite. When a partial solution is given, it is given.
Textual encoding may be that way, in particular when you are exchanging data with others.
But when you are not bound to one specific solution, e.g. you are defining a storage format for the private data of an application, or you have several alternatives to choos from, e.g. 8 bits text is given but you need to select an escape mechanism either for extended characters or characters with special semantics, then you should know the plusses and minuses for the alternatives.
"Because we used it in that other product" is not an assessment Yet, I often have the feeling that we are arguing like that. We should spend some of our efforts on learning why these othere alternatives were developed at all. There must be some reason why someone preferred it another way! Maybe those reasons will pop up in some future situation; then you should not select an inferior solution because "that is what we always do".
What I (optimistically) excect from my colleagues is that they are prepared to realate to the advantages and disadvantages of text and binary encoding. If they are network guys: That the know enough to explain the greatness of IP routing vs. virtual circuit routing, the advantage over layer-3 routing rather than layer 1 switching. Application developers should relate to explicit heap management vs. automatic garbage collection, use of threads vs. processes, semaphores vs. critical regions. And so on.
Surprisingly often, developers know well the solution they have chosen - but that is the only alternative they know well. They cannot give any (well) qualified explanation why other alternavtives were rejcected. I think it is important (in any field, both engineering ones and others) to be capable of defending the rejection of other alternatives as it is to defend the selected one. If you cannot, then I get the impression that you have not really considered the alternatives, just ignored them. And that is what worries me.
For UTF16: yes, that is given, as an internal working format. Yet you should consider what you will be using an external format: UTF-8 is far more widespread for interchange of text info. When is it more appropriate? If you go for UTF-16, will you be prepared to read both big- and little-endian variants, or assume that you will exchange files only with other .net-based applications? Will you be prepared to handle characters outside the Basic Multilingual Plane, i.e. with code points >64Ki?
Even if your response is: We will assume little-endian, we will assume that we never need to handle non-BMP-characters, we will assume that 640K is enough for everyone, these should be deliberate decisions, not made by defaulting.
When Bill Gates was confronted with the 640k-quote, he didn't positively confirm it, but certainly didn't deny it: He might very well have made that remark in the discussion of how to split the available 1 Mbyte among the OS and user processes. Given that 1 MB limit, giving 384 kB to the OS and 640 kB to application code should be a big enough share for the applications, otherwise the OS will be cramped in too little space. 640k is enough for everyone. - In such a context, where the reasoning is explained, the quote suddenly makes a lot more sense. Actually, it is quite reasonable!
That is how I like it. Knowing why you make the decisions you do, when there is a decision to make. Part of this is includes awareness of when there is a decision to make - do not ignore that you actually do have a choice between your default alternative and something else.
|
|
|
|
|
Member 7989122 wrote: We should spend some of our efforts on learning why these othere alternatives were developed at all. 8 bit is not developed as an alternative. ASCII is not an alternative for UTF16.
Member 7989122 wrote: If you cannot, then I get the impression that you have not really considered the alternatives, just ignored them. And that is what worries me. What worries me is that you see improvements of the wheel (with a documented history) as alternatives for more modern standards.
Member 7989122 wrote: or assume that you will exchange files only with other .net-based applications? No, you don't assume; you define an exchange-protocol in specific text-encoding. Should be part of the specs.
Member 7989122 wrote: the quote suddenly makes a lot more sense. The quote that's not his, you mean?
Member 7989122 wrote: That is how I like it. Knowing why you make the decisions you do, when there is a decision to make. Aw, can't argue with that. I assume all your databases are in BCNF?
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Member 7989122 wrote: Isn't the whole bunch of them wheel reinventions? No, they're refinements of said wheel.
Member 7989122 wrote: yet I think that what humans should not mess up, should not be made available for messing up Reading and writing aren't the same thing; making human validation impossible does not help with ensuring a correct write - after all, your application might have a bug and write the wrong stuff. The only thing that making it unreadable does, is prevent a human validation.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
A binary format certainly does not mean that the information and its structure cannot be inspected at all! You do have a tool for inspecting e.g. a binary ASN.1/BER format that let you navigate in the structure, detect format errors (and the reader should support you in that!) etc.
As I mentioned in another post: I made an XML document example using tags in Nortern Sami, making no sense to the audience (nor to me - I got the Sami terms from a collague). Then, there is very little value in the "textual" format, when all you know is that "something" is nested within "something else". I also used an example with a "p" tag, where "p" represented a person p (in one part of the scheme), ordering a product p (in another part), and in the payment information, p indicated a paragrap in the text. Understanding the XML record properly suffers from the use of seemingly readable, but highly amibiguous tag names.
You may limit your application or data format to English format, just to ensure that you as an English speaker can make sense of it. But please state that explicitly as a limitation, then! "This data specification format should not be used in any non-English context". That could be valid for softare development tools used by IT professionals only, but certainly not in a general document context. Administration, business. Home use. Educational material... Be prepared for Chinese macro names. Russian XML tags. ÆØÅ in variable names. Dates in ISO format and 24 hour clock. Those are more or less absolute requirements as soon as you move your application out of the computer lab.
For multi-lingual applications, binary formats give a lot of flexibilty compared to text formats. Of course you can translate on-the-fly, but using a plain integer as an index into a language table is a lot easier than word-to word translation. And you may supply extra info in that language table, e.g. indicated plural forms, gender etc. giving a much better translation.
|
|
|
|
|
Member 7989122 wrote: Be prepared for Chinese macro names. Russian XML tags. ÆØÅ in variable names. We are, since we're no longer limited to ASCII.
Member 7989122 wrote: Dates in ISO format and 24 hour clock. Date-formats are another topic; you should save in ISO, but display nicely in the format that the user has set as his preference in Windows. That's not a suggestion, nor is there a discussion.
Member 7989122 wrote: For multi-lingual applications, binary formats give a lot of flexibilty compared to text formats. Ehr.. no. You could have ASCII in binary, with a completely useless date format.
Member 7989122 wrote: Of course you can translate on-the-fly, but using a plain integer as an index into a language table is a lot easier than word-to word translation. And you may supply extra info in that language table, e.g. indicated plural forms, gender etc. giving a much better translation. We use keys, not integers, and resource-files.
You started with a wheel, now you're also including a dashboard and breaks. I have no idea what you are trying to say
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Eddy Vluggen wrote: We are, since we're no longer limited to ASCII. I was primarily thinking of readability and comprehension, not representation. If you are receiving a support request or error report, and all supporting documentation uses characters that make no sense to you, you may have great difficulties in interpreting the bug report or error request.
And: The alternative to UTF-16 (which is hardly used at all in files) is UTF-8, not ASCII. In the Windows world, you may still see some 8859-x (x given by the language version of the 16-bit Windows), but to see 7-bit ASCII, you must go to legacy *nix applications. Some old *nix-based software and old compilers may still be limited to ASCII - I have had .ini files that did not even allow 8859-1 in comments! But you must of course be prepared for 8859 when you read plain text files from an arbitrary source (and ASCII is the lower half of 8859).
you should save in ISO, but display nicely in the format that the user has set as his preference in Windows Then we are talking about not reading a text representation as as text file, but using an interpreter program to present the information. Just as you would do with a binary format file.
Ehr.. no. You could have ASCII in binary, with a completely useless date format. I am not getting this "ASCII in binary". Lots of *nix files with binary data use Unix epoch to store date and time. If your data is primarily intended for the Windows market, you might choose to store it as 100 ns ticks since 1601-01-01T00:00:00Z - then you can use standard Windows functions to present it in any format. Conversion to Unix epoch is one subtraction, one division. If you insist on ISO 8601 character format, you may store it in any encoding you want, all the way down to 5-bit baudot code
You started with a wheel, now you're also including a dashboard and breaks. Did you ever roll snowballs to make a snowman when you were a kid?
I have no idea what you are trying to say One major point is that binary data file formats, as opposed to a character representation, is underestimated; most programmers are stuck in the *nix style of representing all sorts of data in a character format, where a binary format would be more suitable. (The same goes for network protocols!) I am surprised that you haven't discovered that point.
|
|
|
|
|
Member 7989122 wrote: I was primarily thinking of readability and comprehension, not representation. Readability can't be without representation.
Member 7989122 wrote: If you are receiving a support request or error report, and all supporting documentation uses characters that make no sense to you, you may have great difficulties in interpreting the bug report or error request. No, I mail the provider of said and burn them for not documenting.
Member 7989122 wrote: And: The alternative to UTF-16 (which is hardly used at all in files) is UTF-8, not ASCII. That's not an alternative. One is a more limited version of wheel then the other.
Member 7989122 wrote: But you must of course be prepared for 8859 No, in general I'm not; the specs specify what I should support, and outdated isn't supported.
Member 7989122 wrote: Then we are talking about not reading a text representation as as text file, but using an interpreter program to present the information. Just as you would do with a binary format file. Bin nor text need an interpreter.
Member 7989122 wrote: I am not getting this "ASCII in binary". Lots of *nix files with binary data use Unix epoch to store date and time. ASCII is a text-representation that is stored as bits. Unix epoch has nothing to do with any discussion of text-formats.
Member 7989122 wrote: Did you ever roll snowballs to make a snowman when you were a kid? No. What's the use of that?
Member 7989122 wrote: One major point is that binary data file formats, as opposed to a character representation, is underestimated A representation is not a format. They're all stored as bytes. Google for an ASCII-table, it shows what bytes are used for the character.
Member 7989122 wrote: I am surprised that you haven't discovered that point. I deduce you're not asking a question, but trying to make a point. Mixing text-encodings and date-encodings, trying to prove that not human readable binary is somehow superiour.
You fail to give a simple example to prove so, and your explanation isn't helping me.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
While binary format described by you is interesting it's not what I asked about.
I'll try creating one in the future nevertheless.
|
|
|
|
|
XML is very verbose and JSON doesn't have extendable types.
|
|
|
|
|
XML existed before JSON.
And data interchange formats benefit from being verbose. Due to readability; it's not a binary format.
Come to the point please.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
How does "XML existed before JSON" relate to either "XML is very verbose" or "JSON doesn't have extendable types"?
In which ways do "data interchange formats benefit from being verbose"?
Most users today do not read the raw data interchange format directly, as-is - they process it by software that e.g. highlights labels, closing tag etc, and allow collapsing of substrucures. When you pass it through software anyway, what impact on readability does the format of the input to this display processor have? With semantically identical information, but binary coded, as input to the display processor, why would the readabilty be better with a character encoding of the information rather than by a binary encoding?
|
|
|
|
|
Semantical bullshit, aka wordsmithing. I been on that train before.
You trying to do as if binary is the solution to formats; it's not. Anything, text or date, is stored as bits, and is thus in binary. ASCII is a representation of that, UTF is a better form of ASCII. Dates are stored as floats.
I don't care what university. You can either learn or be rediculed. And damn right I will, at every opportunity.
And yes, being "kind"
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
If you really want me to explain to you the difference between storing an integer, say, as a 32 bit binary number vs. storing it as a series of digit characters, bedayse "ASCII is bits, hence digital", then I give up. Sorry.
|
|
|
|
|
Member 7989122 wrote: If you really want me to explain to you the difference between storing an integer, say, as a 32 bit binary number vs. storing it as a series of digit characters I didn't say that; and not going to explain either. I've no need to, nor any desire.
Member 7989122 wrote: then I give up. Sorry.
Good timing. And please do.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
They are not good enough so I won't use it.
|
|
|
|
|
They might not be efficient to you; but lots of us use them, both, where appropriate.
Try to explain why XML isn't good enough, and to how many floppy-discs you're limited to that you need that optimization.
Do elaborate, please.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
You have several times in this thread more or less insisted on relating to (7-bit) ASCII and floppy disks. Noone else her care about either of those. If they are your frame of reference, then refer your experience to them. I don't care to. And I don't think it the effort to explain why not will be justified.
I am not (and I guess there are a few others agreeing) are not demanding of you that you critically assess you choice of data formats and other solutions. You may go on as you please, with the formats that pleases you, with or without any critical evaluation. You are welcome.
|
|
|
|
|
Not with or without critical evaluation, but an education.
One expects that a developer knows the different text-formats (and encodings, which is the same to you), data-formats, and date-formats. One who mixes those in a semantical bullshit argument gets called out.
So damn right I will. Either play your cards or fold.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
I don't mean that I won't use XML/JSON. I think they are not good enough so I still want to create my data notation. It's just me saying that this is off topic (I used stackexchange sites before) and I just don't want to discuss it any farther (as it doesn't bring anything to my first question).
|
|
|
|
|
What does that have to do with anything? I merely pointed out that there are two existing, well tried and widely supported systems for data interchange. You can use them or not as you choose.
|
|
|
|
|
Well, pointing XML/JSON was off topic as well.
|
|
|
|
|
nedzadarek wrote: I want to create data notation (like JSON is used). So your mention of JSON in your original question was off topic?
|
|
|
|
|
I don't want to waste time on your trolling.
|
|
|
|
|
How is that trolling? As I said i made a couple of suggestions which you were free to ignore. I get the distinct impression (reading your other threads above) that you only came here for a fight.
|
|
|
|
|
Are you looking for something like protobuf?
Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|