Re: Aide pour un programme langage c - Algorithms Discussion Boards

Eddy Vluggen wrote:
Engrish is simple to learn

I wish you'd tell some Brits that! Laugh | :laugh:

(Many "native" English speakers ... don't. At least not half as well as most non-native speakers.)

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

trønderen wrote:
For keywords, this is simple, but it would require a mechanism where code maintainers could assign alternate, language dependent tokens for programmer assigned names of variables, methods, constants, comments,

I would say...no.

First, at least last time I created a compiler much less studied compilers it is not "simple" to replace keywords.

Second the latter part of that statement seems to suggest exactly the problem that compilers need to solve with the first part of what I said. Compilers (and interpreters) already convert key words into tokens. Those that do not are very inefficient (as I know since I had to work with one long ago.)

trønderen wrote:
I think the global software culture would be enrichened if we could disengage from the absolute binding to the English-speaking culture.

No as you already pointed out prior to that. Most discussion about what the software does happens in a natural language. Very likely in the vast majority of cases a single natural language.

Requirements, Architecture, Design are all in that natural language. All of those have much more impact on the solution than the actual code itself.

Significant failures do not happen at the code level. They happen due to a failure in the above processes. Such as the failure with the Mars Climate Observer. The specifications were exact and numerical (which is universal). The communication was not.

jschell wrote:
trønderen wrote:For keywords, this is simple, but it would require a mechanism where code maintainers could assign alternate, language dependent tokens for programmer assigned names of variables, methods, constants, comments,
I would say...no.

Or perhaps yes. Many years ago, when I was just starting university, the community college had a very old (even then) IBM mini. I don't recall the model number, but it was somewhat larger than an "executive" office desk, with a disc-pac to one side, and one of those 7 foot tall chain-driven line printers to the other. It might have been something from the 1400 series: IBM 1400 series - Wikipedia I think the instructor said that he knew of one other example of that model of computer, but it was in a museum!
Anyway, we used that computer to learn Algol and Fortran. The Algol compiler was written somewhere like McGill university, in Quebec. As such, I seem to recall that the keywords were bi-lingual, you could use either English or French. So either if or si. But the error messages were all in French. So maybe at the time the requirement that we take first year French wasn't so pointless after all. Maybe it was written in Paris: https://dl.acm.org/doi/pdf/10.1145/872738.807150 See note about bilingual details on P113

Keep Calm and Carry On

Algol68 was explicitly defined for adaptation to different languages: The syntax was defined using abstract tokens that could be mapped to various sets of concrete tokens.

This is no more difficult than having a functional API definition with mappings to C++, PHP, Fortran, Java, ... Obviously, to define these mappings, you should both thoroughly understand the API, and of course the language you are mapping to. It is not always a trivial thing to do.

When you choose concrete tokens for a programming language, it is not something that you do a Friday night over a few beers. It is professional work, where you must know the semantics of those abstract tokens, and you must know the natural language from which you select your keywords. You must be just as careful when selecting a term as the English-speaking language designers when they select their English terms. If the language defines some tokens as reserved, you must honor that even for your alternate concrete mapping.

In your French Algol version, I assume that the source code was maintained in a plain text file (probably in EBCDIC, for IBM in those days), handled by the editor of your choice. Switching between English and French would require a textual replacement. If the source code was rather stored as abstract tokens, maybe even as a syntax tree, it would require an editor specifically made for this format. (Note that you could still have an selection of editors for the same format!) The editor might choose to look up the concrete syntax only for that part of the tree that is at the moment displayed on screen. 'Translation' is done by redrawing the screen, using another table of concrete symbols.

This is certainly extremely difficult, probably across the borderline to the impossible, if we insist on thinking along exactly the same tracks as we have always done before, refusing to change our ways even a tiny little bit. I sure can agree that it is fully possible to construct obstacles for preventing any sort of change in our ways of thinking. I am not hunting for that kind. Like you, k5054, I observe that 'It happens, so it must be possible'.

trønderen wrote:
The syntax was defined using abstract tokens that could be mapped to various sets of concrete tokens.

In Computer Science the area of Compiler Theory is very old and very well studied.

Your statement is describing something that well designed compilers (and interpreters) already do. Only time I have ever seen a 'compiler' not do that it was coded by someone who had zero training in how the science of Compilers.

As I suggested before the problem is not in creating tokens. The problem is in creating the language in the first place such that it is deterministic and second it creating a compiler that can report errors. That last part is the most substantial part of every modern compiler (even toy ones.)

trønderen wrote:
If the source code was rather stored as abstract tokens,

Parsing text into tokens is the first part of what all compilers/interpreters do.

Following is one source of the very well known process Compilers already do.

Compiler Design - Phases of Compiler[^]

What you are describing does not have anything to do with the actual problem.

English version of a standard (very standard) part of programming languages

if x then y

Now the French version

si x alors y

So in the above for just two natural languages you now have 4 keywords in the language.

Lets add Swedish

om x så y

So for every language added it is reasonable to expect that the number of keywords would be duplicated. Keywords often cannot be used in code both because it makes it much harder for the compiler to figure it out and for it to correctly report on errors. Additionally even when the context allows the compiler to figure it out it does not make it ideal for human maintenance.

Consider the following statement. If one was using a different native language to drive the compiler then the following should be legal. But in the english version do you really want to see this code?

int if = 0;

So not only would the number of keywords increase but the programmer would still need to be aware of all of those keywords while coding.

Now besides the increasing number of keywords the following are some of the problems that I see.
1. Two programmers are working on the same file. The file MUST be syntactically correct before developer A (English) goes on vacation. Because otherwise the mechanism (code) that must translate it back from the english form will not work when Developer B is french.
2. Comments cannot be supported.
3. Third party APIs would still require whatever is supported by by the 3rd party service (library, Rest, TCP, whatever.)
4. Adding new languages to the compiler after first release would mean that existing applications could break because existing code might use them. This also is a well known problem that exists right now when new functionality is added to an existing compiler. So all known languages would need to be supported on first release.

jschell wrote:
In Computer Science the area of Compiler Theory is very old and very well studied.

I sure wish that was true for everybody creating new languages! (Note that I did not refer explicitly to C and all the languages derived from it.)

The problem is in creating the language in the first place such that it is deterministic and second it creating a compiler that can report errors. That last part is the most substantial part of every modern compiler (even toy ones.)

Reminds me of VAX/VMS: Every message delivered by system software (including compilers) were headed by a unique but language independent numeric code. Support people always asked you to supply the code; the message text could be in any language - they never read that anyway.

So for every language added it is reasonable to expect that the number of keywords would be duplicated.

You are missing my point completely. Neither if, then, si, alors, om or så, are reserved words in the language. The language would define non-text tokens, call them [if] and [then] if you like, but the representation is binary, independent of any text.

Keywords often cannot be used in code both because it makes it much harder for the compiler to figure it out and for it to correctly report on errors.

Noone is suggesting that you are allowed to use the binary [if] token as a user defined symbol.

The display representation of the binary [if] token could be e.g. as (boldface) if, or as [if], si, [si], om, [o] or some other way to visually highlight that this is not a user identifier but a control statement token. For creation of new control structures, an IDE working directly on a parse tree representation could provide function keys for inserting complete control skeletons. I have been working with several systems working that way, both for data structures, graphic strucures - and for program code, although the latter inserted textual keywords, not binary tokens the way I wish it to do. Once you get out of the habit of thinking of your program as a flat string of 7-bit-ASCII characters, it it actually quite convenient! (You can assign the common structures, like if/else, loops, methods etc. to F1-F13 keys so that you don't have to move your hand over to the mouse for selecting from a menu.)

So not only would the number of keywords increase but the programmer would still need to be aware of all of those keywords while coding.

Quite to the contrary! The programmer might very well define a variable named if, which is distinct from the binary token [if]. There would be no reserved words on the textual level.

A not very well known fact: Classic FORTRAN actually managed without reserved words. I just posted an entry in 'The Weird and the Wonderful' - something from my student days that I found in a box in the basement - to illustrate the point. Note, however, that F77 philosophy is not what I am asking for: It did not represent control (and other) structures by binary tokens, but relied on semantic analysis of plain text source code.

1. Two programmers are working on the same file. The file MUST be syntactically correct before developer A (English) goes on vacation. Because otherwise the mechanism (code) that must translate it back from the english form will not work when Developer B is french.

I say again: You missed my point completely. If the IDE stores the code as a parse tree, it is syntactically correct, otherwise the IDE would not have accepted it. Of course developer B may define user variables and methods with French names, but so he can in any IDE environment.

2. Comments cannot be supported.

Why can't the parser define a binary 'comment' token, and store that in the parse tree? In one project I am currently working on (which is not a general programming language, but an application specific control language), we are doing exactly that. The comment token may have a value field with several alternate texts, each identified by a language code, so that if you select, say, French as you UI language and there is a French version of the comment, that is the one to be displayed. (Otherwise, when the English, say, comment is displayed, you can add a French translation of it.)

3. Third party APIs would still require whatever is supported by by the 3rd party service (library, Rest, TCP, whatever.)

I have been working with third party APIs with French method and parameter names, in an otherwise English language environment; it was a nightmare ... If you define a language along the lines I am suggesting, a library would be delivered as a parse tree as well, along with one or more (i.e. different languages) symbol tables for use in the API. (This is how we do it in that application control language mentioned above). Otherwise, if the binary interface is given, the library comes in a compiled, linkable format with given entry point symbols, your parse tree interface to that library should include a mapping from a call token to the entry point symbol, unlinking that symbol from the external display. Establishing this mapping is a one-time operation that could follow the library file, similar to how a '.h' file follows a C library.

4. Adding new languages to the compiler after first release would mean that existing applications could break because existing code might use them.

Assuming that you refer to language features, introducing new keywords. If there are no keywords, the problem you are pointing to, vanishes. Adding a new binary token, with its unique token ID, would not invalidate any program whatsoever. Of course there is the question of where the display mapping is done: If the IDE does it, and imports a new compiler with new binary tokens, it might not have a proper French or Swedish word to represent it. If the new and extended compiler is delivered with a token display mapping table for a number of languages, the problem is significantly reduced. (The user may have a language fallback list, both for comments and other binary tokens, so that something meaningful is displayed, although not in the primary language.)

As I wrote in my post,

trønderen wrote:
This is certainly extremely difficult, probably across the borderline to the impossible, if we insist on thinking along exactly the same tracks as we have always done before, refusing to change our ways even a tiny little bit.

Almost all of your comments are fundamentally based on the idea that a source program really, as a matter of fact, is a string of 7-bit-ASCII characters, and this will always remain true. I am suggesting that it is not.

Compare an old style text formatter such as troff with, say, MS Word: You may argue that '\fI' is like a reserved word for italicizing text; you cannot use it as plain text (without quoting). Troff stores everything as plain text. MS Word does not - prior to .docx, the storage format was a true binary format, and even XML is just a storage encoding - internally, the working format is binary, just like before. In MS Word, '\fI' is freely available as document text without quoting. Furthermore, you can move a document from an English MS Word to a French one and then to a Swedish one: The menu texts, help texts etc. change language, yet an edit made in one language version is equally valid in other language versions of MS Word.

I certainly can imagine a programming language, and its parse tree storage format, being designed along the same principles.

trønderen wrote:
I sure wish that was true for everybody creating new languages! (Note that I did not refer explicitly to C and all the languages derived from it.)

C? Compiler theory applies to any language (including interpreters.)

trønderen wrote:
Neither if, then, si, alors, om or så, are reserved words in the language. The language would define non-text tokens, call them [if] and [then] if you like, but the representation is binary, independent of any text.

That is a non-starter.

The human needs to write the code. Using token representations that the user is responsible for memorizing would not work. If the user at any time uses something like 'if' and 'then' then those are keywords for the language. That is how it works. Just as in native languages it works that way. Changing semantics (english) does not alter the role of what a system that eventually must run code must still do in that it still must convert the keywords into something else.

And defining keywords is necessary for any computer language because it is not deterministic otherwise.

trønderen wrote:
The display representation of the binary [if] token could be e.g. as (boldface) if, or as [if], si, [si], om, [o] or some other way to visually highlight that this is not a user identifier but a control statement token.

Errr...no idea what you are talking about.

The 'bold' just becomes part of the textual representation of the keyword. No different than requiring that the keyword is in lower case.

You seem to think that because you use bold on a keyword that it is no longer a keyword. It doesn't matter how you differentiate the language specification is it still a keyword.

And no developer is going to work in a language where they need to make keywords by switching from bold and back.

trønderen wrote:
Why can't the parser define a binary 'comment' token,

Because the content of the comment is NOT the token that tells the compiler that it is comment. The content of the content is what is contained by the comment. So in the following the value of the comment in text not the '//'

// A comment in english is useless in french.

trønderen wrote:
I have been working with third party APIs with French method and parameter names

Only when named parameters are supported and used can the parameter names matter.

And when you use it in English exactly how are you going to use that method unless you have English that tells you how to use it?

trønderen wrote:
Assuming that you refer to language features, introducing new keywords. If there are no keywords, the problem you are pointing to, vanishes

I have studied Compiler Theory formally and informally for a long time. That statement, by itself, is not possible.

As I said, tokenization itself, is not something that is new in Compiler Theory. It has been there for a very long time. That very word is the process of converting keywords to tokens. You seem to think you are going to be able to remove keywords from the definition of the language but failing to describe, in detail, how a user is then going to be able to do something without using keywords.

trønderen wrote:
Furthermore, you can move a document from an English MS Word to a French one and then to a Swedish one: The menu texts, help texts etc. change language, yet an edit made in one language version is equally valid in other language versions of MS Word.

You do understand that a MS Work doc is a binary file which has embedded symbols in it which define the format?

The text of the document is NOT the relevant part. The analogy to code for a Work doc is that all of the text that you see in MS Word is a 'comment'.

However when you write code most of what you write and what you debug is not comments. So you are proposing the the keywords of the language would be written using combination key presses. For every single thing that one wrote.

trønderen wrote:
I certainly can imagine a programming language, and its parse tree storage format, being designed along the same principles.

Knock yourself out. It is call BNF - Backus Naur Form notation.

The Java example (however it has bugs in it.)

Chapter 18. Syntax[^]

jschell wrote:
C? Compiler theory applies to any language (including interpreters.)

Well, of course. And it sure is a good idea to know at least fundamental compiler theory before you sit down to create a new language, if you want to make a good one. History has shown that not all language makers have had extensive compiler theory background. Hence my comment.

The human needs to write the code. Using token representations that the user is responsible for memorizing would not work. If the user at any time uses something like 'if' and 'then' then those are keywords for the language. That is how it works.

Once again: Try to liberate yourself from this fixation on a code file always and invariably maintained and stored as a flat string of ASCII characters.

Hopefully, you are able to do that in document processing systems: You create a new chapter level two by hitting a function key or making a menu selection, not by inserting e.g. the strings '< h2>' and '< /h2>' in the body text. Sorry about the extra space after the '< 's - it is required here, because this is not a proper document editor. In, say, MS Word, I could have written the markup without any such considerations. In a document processor, there are no reserved text body words, character sequences or characters.

There is no law of nature that says there must be keywords / reserved words just because that document is source code for a compiler / interpreter, that structure must be represented by textually - that is not 'how it works'. Any WYSIWYG document processor will prove you wrong.

And defining keywords is necessary for any computer language because it is not deterministic otherwise.

You certainly need to define a representation for structural elements, but try to understand that once you liberate yourself from the flat-sequence-of-characters mindset, those structure elements need not be alphabetic. In a document processor file, there are no 'keywords' to represent a hierarchical chapter / section structure; the structure is maintained in binary, non-textual format. You could do the same for a program code file. (I said this earlier; it appears necessary to repeat it.)

Errr...no idea what you are talking about.
The 'bold' just becomes part of the textual representation of the keyword. No different than requiring that the keyword is in lower case.

That is because you seem to be completely stuck in the mindset of a code file by definition being a flat sequence of printable characters, maintained using 'vi', or 'TECO' if you are old style ('emacs' if you are more up to date on tools).

And no developer is going to work in a language where they need to make keywords by switching from bold and back.

So you have not understood a word of what I am talking about. How is that in a document processor? You do not create a new chapter by inserting some extra space, then switching to a larger, possibly bolder and different typeface, possibly enter the next higher chapter number before typing the chapter heading, add some extra space to the first paragraph, and then resetting to the standard body text format.

No, that is not the way you work: You press e.g. Alt-1 (that is how my MSWord is set up), and the editor takes care of inserting a binary structure representing a level-1 chapter heading. It is displayed with extra space before and after, in a larger, bold typeface etc., not because I inserted space or changed the typography. I inserted a structure element that was displayed that way.

If my code editor lets me insert a structure element, say a conditional if, or a loop, or a method definition, in a similar way, the code editor may display those structure markers in one of several possible ways. I suggested that the display could be 'keyword-like', but typographically marked e.g. by being enclosed in brackets or boldfaced so that the programmer would not mistake them for being plain ASCII strings (such as user specified variable / method names). Displaying space before and after method headings (similar to how document chapters are highlighted by a document editor) would be another display indicator of structure. One obvious way of displaying code structure is to indent a loop body, a 'then clause' or 'else clause'.

The programmer will not switch to boldface, add brackets or blank lines to create a structure element, not even hit the tab key or space bar to indent a loop body, but e.g. press Alt-1 to create a namespace, Alt-2 to create a method, Alt-3 to create a conditional, Alt-4 to create a loop, and so on. The editor would display something to show the structure, but whatever it displays, it is not 'keywords' in the textual sense. And it is not editable in TECO or vi.

In a document editor, you may insert blank lines, select a larger and bolder typeface, type a number and a line of text, add a new blank line and after that revert to the typography of body text. That might look like whatever a chapter heading is displayed as, but it won't make it a chapter object, in the sense of the document editor data structure. Similarly if your IDE represents your program as a parse tree, writing 'if' (rather than hitting the function key to create a conditional statement, will not create a conditional statement. If 'if' is not a known symbol, the IDE might ask you, as soon as you complete that token, "'if' is not a known symbol - do you want to (1) create a local variable named 'if', (2) create a static variable in this module, called 'if', (3) ...)". If the IDE is English language UI, it might even suggest "(4) did you intend to insert a conditional statement in your code?", but in a Norwegian language UI, this would not be triggered for 'if', but maybe for 'hvis'. If the programmer selects this alternative, the 'keyword' is not inserted into the program; the binary conditional statement object is.

Because the content of the comment is NOT the token that tells the compiler that it is comment. The content of the content is what is contained by the comment. So in the following the value of the comment in text not the '//'

Are you completely unable to imagine a binary element that is displayed as, say, '// comment text'? You might want to display it in bold, or maybe italics, to show that this is a comment, it is not user code inserted by two presses of the '/' key. If you do that, the two slashes will not be displayed in bold / italics, and will not have comment semantics - similar to writing '< h1>' in a Word document does not create a new top level chapter.

// A comment in english is useless in french.

Did you notice the sentence in my previous post,

trønderen wrote:
The comment token may have a value field with several alternate texts, each identified by a language code, so that if you select, say, French as you UI language and there is a French version of the comment, that is the one to be displayed.

That is exactly what I am doing in my current project (which, as I mentioned earlier, is more like a scripting language than a programming language).

jschell wrote:
trønderen wrote:I have been working with third party APIs with French method and parameter names
Only when named parameters are supported and used can the parameter names matter.

Named parameters are quite standard in modern programming languages. But even in K&R C, you will see the parameter names in .h files, and often you have to deduce the semantics from the variable name.

If your program representation was a parse tree, as I suggest, even variables, types and methods would have internal text-independent representations; the display of them could be based on looking up that internal ID in a symbol table. This symbol table could exist in several language variants. (I said this before; you obviously overlooked it.)

And when you use it in English exactly how are you going to use that method unless you have English that tells you how to use it?

If you select English as your UI language, then you see the English identifiers and English language comments. As I wrote in my previous post:

trønderen wrote:
The comment token may have a value field with several alternate texts, each identified by a language code, so that if you select, say, French as you UI language and there is a French version of the comment, that is the one to be displayed.

As I have state in other posts, this goes for all program elements as well, including user defined symbols and how binary structural elements are displayed.

In my current project, the user language preference is actually a preference list: If no name / comment is available in your preferred language, your second, third, ... choice is taken. It may of course happen that no one has translated the symbol table to any language that makes sense to you. That can be done at any later time, and the problem is significantly reduced from forcing every non-native-English-speaker to work in a foreign language. My project includes a search function for symbol table entries and comment texts that do not yet have a translation to the current UI language, so that you can easily find those terms that you have forgotten to translate to, say, Norwegian before presenting the code to a Norwegian speaker.

You may very well choose to insist on always including an English symbol table, as a fallback when a translation is missing, but you should be prepared to accept that not all foreigners will agree with you that English is a better fallback in their native environments.

trønderen wrote:Assuming that you refer to language features, introducing new keywords. If there are no keywords, the problem you are pointing to, vanishes
I have studied Compiler Theory formally and informally for a long time. That statement, by itself, is not possible.

Obviously, you have limited your study of Compiler Theory to purely textual input. You are clearly incapable of comprehending how an English MSWord user can select 'Heading 1' and a Norwegian MSWord user select 'Overskrift 1', and both actions lead to the same result. If the Norwegian document is moved to an English MSWord, the 'Overskrift 1' style magically is identified as 'Heading 1' - believe it or not. (I understand that this is completely incompatible with your Compiler Theory knowledge, but it is a fact.)

Again, comparing to document processors: I don't know when 'hidden text' was introduced to MSWord, when this new binary object (or maybe it was a new parameter or a new parameter value for an existing binary object definition - that makes no difference). No matter how any of my documents looked like at the time, they couldn't possibly be invalidated by the the new possibility of hiding text.

Let me exemplify the same in a programming language context:

I was programming in a language where 'for' loops could be conditionally terminated prematurely by a 'while <condition>', comparable to C 'if !<condition> break': Sometimes, you want different treatment if the loop iterates to its end or if it is terminated prematurely. E.g. when you search a list or array, and find what you are looking for (exiting prematurely), or you reach the end without finding it, requires different handling. In this language, you could specify the two alternatives by adding to the loop an 'exitwhile' clause for the premature termination, and/or an 'exitfor' clause for loop completed termination. Both clauses were executed in the context of the loop body with access to e.g. loop local variables.

If 'exitwhile' and 'exitfor' clauses were added to a textual programming language, then programs using variable names 'exitwhile' and 'exitfor' would be invalidated. If a loop is rather represented by a binary object, and this object is augmented with two new fields: One pointer to an 'exitwhile' code block, another to an 'exitfor' code block, both initially null / nil / void, then no old program would be invalidated. The updated IDE would need to provide a way for the programmer to insert exitwhile/exitfor clauses, but not through any such keyword. They might be displayed in a similar way to the 'for' and 'endfor' markers (note: not as editable text, but typographically highlighted so that you would recognize it as structure indicators) with initially empty clauses. Until you start using this facility, you and your code are completely unaffected by the new fields in the binary loop object.

I am assuming that your old loop object missing these fields would still be valid: E.g. all objects should contain a size value, and any software handling the program file would know that a shorter length loop object is a loop without the new clauses, not making any fuzz about it. (The IDE could even store any loop object not making use of the extra fields in the short form.)

You seem to think you are going to be able to remove keywords from the definition of the language but failing to describe, in detail, how a user is then going to be able to do something without using keywords.

Your mind seems to be completely fixed on 'keywords' being textual. When I press Alt-1 in MSWord, you may consider that a 'binary keyword' resembling the textual '< h1>. In a programming language, Alt-1 might resemble the textual 'namespace', Alt-3 might resemble 'if ... then ... else ... endif'. One essential reason for saying that they 'resemble' textual keywords: The IDE's input processor will immediately process them, just like MSWord processes Alt-1 immediately to insert a top level chapter object, to create the binary structure objects. The file will never store the Alt-1 or Alt-3. You might assign that 'Create new namespace' or 'Create conditional statement' to any function key, menu selection etc., and different users may make different assignments - the binary objects are the same. So the assignments made by any one user is not any sort of 'reserved word'.

You do understand that a MS Work doc is a binary file which has embedded symbols in it which define the format?

Most certainly - but you obviously fail to understand that I am suggesting exactly the same for a programming language code file. The specification of the binary format might be treated as the formal language definition, just like ISO/IEC is the formal definition of OOXML objects. (This is what I have been talking about all the time!)

You may consider OOXML to be a 'document programming language' - it is not defined in BNF, but as an XML schema. Functionally, those are roughly equivalent.

Knock yourself out. It is call BNF - Backus Naur Form notation.

BNF is certainly not limited to specification of the syntactical interpretation of flat sequences of printable characters.

One of my fellow students, in his first job were set to identify various kinds of bacteria in microscopy photos. The various kinds of bacteria, i.e. the shapes of them in the images, were described in BNF format. The images were scanned and the scan lines 'compiled' according to the BNF defined syntax. The same image were 'compiled' according to different BNFs, each for a different bacteria, and the one(s) giving the fewest 'syntax errors' were considered primary candidates for the identification. (This was in the early 80s, when technology was less sophisticated than today, and they did not rely completely on automatic identification; they used the BNF analysis to rule out those hundreds of alternatives that most certainly did not match. A medic had to confirm the identification. Yet, this was a real work saver.)

If you create a new language to be stored as a parse tree rather than as a linear character sequence, you would most likely create even that definition in some BNF variant, or in a similar definition language. Plain BNF is semi-abstract; it uses character strings in the definition, but the only structure representation is the BNF itself. For a non-textual structure representation, it does not define a unique storage format.

So if you were to create a binary language representation, you should rather use something like ASN.1, which resembles BNF in that it defines abstract objects. Then you can select one of the defined 'encoding rules' for the generating of concrete object representations that can be stored or transmitted. If you go for ASN.1 for the abstract specification, but dislike all the existing coding rules, you can even make up your own new encoding rules - that is usually caused by a 'Not Invented Here' rather than a qualified professional evaluation of existing alternatives.

It is interesting to note that BNF was initially developed to describe natural languages, based on Chomsky's production rules and transformations (and even earlier linguistic studies). The metasymbols used by Backus and Naur was adapted to standard keyboard characters, but the principles are essentially those of Chomsky.

Curiously enough, the Java example you point to diverts strongly from 'classical' BNF, the way Backus and Naur defined it. They have even redefined the very basic '::=' symbol. If you want to refer to a BNF programming language definition, you should rather select Pascal, which was originally defined in 'classical' BNF (see Appendix D of Jensen & Wirth: Pascal User Manual and Report), although it is frequently presented in some revised BNF variant. You'll find one that is fairly close to the original at Syntax von Pascal In Backus-Naur Form (BNF)[^]

BNF is still used today, but is considered somewhat outdated by quite a few people. So various groups have extended and augmented it significantly, and later replaced it by similar languages - which might be viewed as alternative derivatives of Chomsky, rather than derivatives of BNF. Some of the changes are cosmetic, or in the style of 'I want to save a couple keystrokes when typing!', such as reducing '::=' to ':'. Whether or not any one of these alternatives is "better" than classical BNF is a matter of personal taste.

Don't misunderstand my comments: All that you say are perfectly valid as long as we limit ourselves to program code represented as as linear sequence of printable characters, all editable by the programmer.

That is a very limiting context. From my very first post in this thread (almost two months ago), I have suggested that we extend the scope to other representation formats:

trønderen wrote:
My hope (but I am not very optimistic!) is that all application programming will move over to an abstract representation where the language form is merely a display phenomenon; the program itself is stored in an abstract, language independent form. For keywords, this is simple, but it would require a mechanism where code maintainers could assign alternate, language dependent tokens for programmer assigned names of variables, methods, constants, comments, ...

This is what I have pushed in all my following posts.

I also wrote, in a more recent post (well ... two days later, November 14):

This is certainly extremely difficult, probably across the borderline to the impossible, if we insist on thinking along exactly the same tracks as we have always done before, refusing to change our ways even a tiny little bit. I sure can agree that it is fully possible to construct obstacles for preventing any sort of change in our ways of thinking. I am not hunting for that kind.

I guess this point has been extensively highlighted by now.

trønderen wrote:
History has shown that not all language makers have had extensive compiler theory background.

Who exactly?

trønderen wrote:
there must be keywords / reserved words just because that document is source code for a compiler

You seem to be missing the point.

The compiler creates tokens from the key words. The key words exist because humans require them. You are not removing humans from your idea so key words are still required.

trønderen wrote:
You could do the same for a program code file

The key words and the rest of the language definition provides the structure that the compiler then creates. Doesn't matter how you wrap it up the human must still provide the information.

trønderen wrote:
How is that in a document processor?

I have written multiple compilers/interpreters so I do understand how they work. I have also delved into the source code for other compilers and editors.

As I already said you are equating the text in a Word document that seems important to humans to be the same as what is important in a programming language. That is simply not true. You analogy is flawed. As I pointed out the text that humans sees in a word document is equivalent to the text in a comment in code.

trønderen wrote:
I have suggested that we extend the scope to other representation formats:

So certainly no one else in 80 years has wondered if there is not a better way to provide for programming so obviously it is up to you to actually create what you are suggesting.

Good luck.

I know that I am late joining this conversation but ...

You refer to keywords in Algol 68. Algol 58 (which Algol 60, Coral 66, Algol 68 (R and S), Algol W etc were derived from) just had tokens, as you state. The characters or symbols used to create tokens were an implementation issue, not a design issue. The standards used letter sequences to indicate the uses of the tokens (e.g. begin, end and if) but that was purely for typographic reasons for the specification and did not define how they were to be entered. The version of Algol 60 that I used (ICL 1900) used quoted strings (e.g. 'BEGIN', 'END', 'IF'). The use of braces in C to represent begin and end would have been perfectly acceptable implementations. Some of the uses of (, ), ? and : in Algol 68 were valid actualisations of the begin, end, then and else keywords. I liked the Algol 68 mirror image brackets e.g. ( and ), [ and ], CASE and ESAC, IF and FI; especially as you could also use COMMENT and TNEMMOC.

You may have noticed that all of the keywords (not tokens) above are all in uppercase - that is because I worked on 6-bit character machines and lowercase did not exist.

I guess that source code files were stored as plain text, using the selected set of word symbols, right? So you couldn't take your source file to another machine, with other concrete mappings, and have it compiled there.

(I have never seen TNEMMOC, but I have seen ERUDECORP. I suspect that it was a macro definition, though, made by someone hating IF-FI and DO-OD. Btw: It must have been in Sigplan Notices around 1980 one guy wrote an article "do-ob considered odder than do-od". The article when on to propose that a block be denoted by do ... ob in the top corners and po ... oq in the lower corners. Maybe it wasn't Sigplan Notices, but Journal of Irreproducible Results Smile | :)

).

I have come to the conclusion that a better solution is to store the parse tree, and do the mapping to word symbols only when presenting the code on the screen to the developer (with keywords indicating structure etc. read-only - you would have to create new structures by function keys or menu selections). This obviously requires a screen and 'graphic' style IDE, which wasn't available in the 1960s, and which required processing power that wasn't available in the 1960s. Today, both screens and CPU power come thirteen to the dozen.

One obvious advantage is that you can select the concrete symbols to suit your needs, mother tongue or whatever. A second advantage is that you never see any misleading indentation etc. - any such thing is handled by the IDE. This is of course closely connected to the third advantage: As all developer input is parsed and processed immediately, and rejected immediately if syntactically incorrect, there is no way to store program code with syntax errors.

Of course the immediate parsing requires more power than simply inserting keystrokes into a line buffer, but it is distributed in time: Spending 10 ms CPU for keystrokes separated at least 100 ms apart is perfectly OK (and you do it not per keystroke, but per token).

And, you save significant time when you press F5 (that is is VS!) to compile and run your program in the debugger: Some steps that are known for being time consuming are already done. Lexing and parsing are complete. F5 can go directly to the tree hugging stage, doing its optimizations at that level, and onto code generating, and the program is running before your finger is off that F5 key.

In my (pre-URL) student days, I read a survey of how various compilers spent its time. One extreme case was a CDC mainframe that spent 60% on its time fetching the next character from the source input file! Most computers (even in those days) are better at reading text input; yet, lexing and parsing (where maybe 90% of the error checking takes place) is a resource hog. The dotNet jitter is speedy much because it doesn't have to do much error checking (nor does it handle character input).

In other kinds of computer applications, such as document processing, CAD/CAM systems and lots of others, users have no problems accepting that their data are managed and stored in binary formats. Other branches in this discussion has most definitely shown that this is a tough one for software developers: Letting the 7-bit ASCII file go is like completely loosing control of their code. So I certainly do not expect the next version of VS to provide an option for storing my C# file as a parse tree!

I do that for this one project I am developing, though. We are not talking about C#, but a simplified version where (non-IT-people) specify what goes into selected action alternative, repeated actions etc. All C# red tape is hidden from the user, but the 'code' they specify is used to generate C# code inserted into a skeleton, which is then compiled behind the back of the user. The user sees everything is his mother tongue, and in a simplified manner that hides lots of details of minimal interest to the common user. So ... I know that this is a workable solution, in particular when making UIs for non-technical users. I am 100% certain that it would be equally workable for a programming language.

trønderen wrote:
I guess that source code files were stored as plain text, using the selected set of word symbols, right? So you couldn't take your source file to another machine, with other concrete mappings, and have it compiled there.

Excuse me to put my few pennies in your great discussion, however, I am afraid that in times of ALGOL-60 the "source code files" existed (only) in the punched cards! Poke tongue | ;-P

My freshman class was the last one to hand in our 'Introductory Programming' exercises (in Fortran) on punched cards. Or rather: We wrote our code in special Fortran coding forms, and these were punched by secretaries, and the card decks put in the (physical!) job queue.

The Univac 1100 mainframe did have an option for punching binary files to cards. I believe that the dump was more or less direct binary, zero being no hole, 1 a whole, 4 columns per 36 bits word. Such cards were almost 50% holes, so you would have to handle them with care! (I never used binary card dumps myself.)

The advantage of punch cards is that you had unlimited storage capacity. When we the following year switched to three 16-bit minis, full-screen editor and Pascal, we had 3 * 37 Mbyte for about a thousand freshman students. When the OS and system software had taken its share, each student had access to less than 100 kbyte on the average, and no external storage option, so we had to do frequent disk cleanups (I was a TA at the time, and the TAs did much of the computer management).

[edit]
I had written the response below before I noticed that other folks had already replied with similar stories.
[/edit]

trønderen wrote:
I guess that source code files were stored as plain text, using the selected set of word symbols, right? So you couldn't take your source file to another machine, with other concrete mappings, and have it compiled there.

That is correct - the source code was hand punched onto cards (80 column). I got quite adept with the multi-fingering buttons for each character. You then put the box of punched cards into a holding area where someone would feed them to the card reader (hopefully without dropping them and random sorting them). Then the job was run and you got a line printer listing delivered to the same holding area (where, hopefully, your card deck was also returned to) - this is the first time that you can see what the texts were that you had written. At University, the turn round time was 1/2 a day; at my first full time job it was nearer a fortnight; so computer run times were an insignificant part of the round-trip time.

Before Uni, I had to do coding sheets which were posted to the computer centre (round trip one or two weeks). This added an extra layer of jeopardy - would the cards be punched with the texts written on the coding sheets? The answer was almost invariably 'No' for at least three iterations; so the first run (enabling debugging) could be six weeks later than the date that you wrote the program.

jschell wrote:
First, at least last time I created a compiler much less studied compilers it is not "simple" to replace keywords.

You certainly must know the semantics of the abstract token. And you must know the concrete language you want to use. This is not a job for Google Translate. But we are not talking about defining a new programming language; just modifying the symbol table used by the tokenizer. That is magnitudes simpler than making an all new compiler.

jschell wrote:
Second the latter part of that statement seems to suggest exactly the problem that compilers need to solve with the first part of what I said. Compilers (and interpreters) already convert key words into tokens. Those that do not are very inefficient (as I know since I had to work with one long ago.)

Quite to the contrary. If you store the source code by their abstract tokens, the result of the tokenizers job (and maybe even after a fundamental structure parsing), that heavy job you are referring to is done "once and for all". Or at least until the code is edited, but then a reparsing is required only for that substructure affected by the edit. Further compilation would be significantly faster, as much of the work is already done.

We have had precompiled header files for many years. This is along the same lines, except that the pre-parsed code is the primary source representation, with a mapping back to ASCII symbols is only done for editing purposes, and only for that part of the code displayed to the programmer at the moment.

jschell wrote:
Requirements, Architecture, Design are all in that natural language. All of those have much more impact on the solution than the actual code itself.

In "a" natural language, yes. English. Far too often, the design language is English, even if the customer and end users have a different natural language.

The norm is that in the very first input steps from the customer, the native language is used. Then the developers retract to their ivory tower to create The Solution, as they see it. First: Then it is back to English, and second: The ties to whatever was discussed with the customer is next to non-existing. The software structure is not a more detailed break-up of the boxes you showed the customer. The modules, functions, methods, data structure names are unrelated to the terms used in the discussion with the customer. If the customer has a complaint, saying for example "We think that when handling this and that form, in this and that procedure, you are not calculating the field with the way it should be done. Can you show us how you do it?", first: Going from the functional module and the procedure in question, as identified in the discussion with the customer, to the right software module and method in the code, may be far from straightforward. Developers do it their own way; the customer presentation is only for the customer. Second: If the developers are willing to let the customer see the code (most likely they refuse!), there is a very significant risk that the customer will have a hard time recognizing his problem: The programmers do not know the professional terminology, so they mislabel terms. They have never recognized the natural breakdown of partial tasks, solving the problem in what to the customer appears as convolved ways. The programmers have never seen the real application environment, do not know how different attributes really belong together.

If the developers did not switch from a native language to English as soon as the customer is kicked out the door, far more real domain knowledge could be preserved in the solution. In the development process, the customer could provide a lot more domain knowledge, correcting and aiding the developers as they progress with the more detailed design, and onto the code design. A customer who knows the logic of the solution, even if he doesn't know the specific code statements, is able to provide a lot more helpful feedback, both in bug finding and in discussions about future extensions and additions.

I know very well that customers without a clue about computers can give very valuable input. They can discuss solution logic. They can discuss data flows and functional groups. I have done that in several projects, serving as a messenger between the end users and the development group. (And in one library project, when I put forward the wishes from the future users, the project leader sneered back at me: "F*** those librarians! ... That was his honest attitude towards the end users; they were a source of noise, nothing more.)

My attitude towards customers and end users is at the very opposite end of the scale.

I am working on ranking different social influencers based on a set of metrics.

Metrics collected: 
•   username
•   categories (the niche the influencer is in)
•   influencer_type
•   followers
•   follower grow, follower_growth_rate
•   highlightReelCount, igtvVideoCount, postsCount
•   avg_likes, avg_comments
•   likes_comments_ratio (comments per 100 likes, use as in authentic indicator)
•   engagement_rate
•   authentic_engagement (the number of likes and comments that come from real people）
•   post_per_week
•   1/post_like, 1/post_comment (total 12 latest posts)
•   1/igtv_likes, 1/igtv_comment (total 12 latest igtvs)

Here's how the data looks like:
Sample_data - https://drive.google.com/file/d/15obMah9pGI3CutOZMJNqfr3O95rLz2JS/view?usp=sharing

Objective: Rank the social influencers according to their influential power with the use of the metrics collected above.

There are a few ranking algorithms to choose from, which are:
a) Compute the score for influential power with Multi-Criteria Decision Making (MCDM) and rank it with regression
b) Create classification model and rank them through probability
c) Compute the score for influential power with Multi-Criteria Decision Making (MCDM) and rank it with machine learning model like SVM, Decision Tree and Deep Neural Network
d) Learning to rank algorithm like CatBoost 
e) Trending algorithm

I would like to ask which algorithm above will be more suitable in this project and could you compare and provide the reasons for it? Any ideas will be much appreciated!

External links for algorithms:
1. [MCDM](https://towardsdatascience.com/ranking-algorithms-know-your-multi-criteria-decision-solving-techniques-20949198f23e) 
2. [Catboost](https://catboost.ai/en/docs/concepts/python-quickstart) 
3. [Trending algorithm](https://www.evanmiller.org/deriving-the-reddit-formula.html)

Ask each one how much money they make: the bank account algorithm.

"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I

I have a task to complete. I'm trying to learn Visual Studio. And I have another task of programatically manipulating data inside and existing Excel worksheet.

It will involve manipulating an Excel Spreadsheet.

This will involve Opening, Closing, Saving, Modifying cells, etc.

I wish to also learn Visual Studio IDE. To do both, I'd like to write a simple program example (or see one) that uses Visual Studio, without any aftermarket libraries, like Panda or others.

Basically what I'd like to do is install Visual Studio and all relevantt tools that comes with it, to support this task?

I have already installed Visual Studio 2022. I also have the Community version on another computer. I think I have options on what compiler to use. That is, I think my Visual Studio comes with the usual suspects C#, C++, Python, etc. Correct me if I'm wrong.

Questions:
1) How hard is it to use just Visual Studio, and one of the included compilers to, Open, Close, Edit, Save, etc., Excel files?

2) What included compiler should I use? or that is, which is best for what I want to do?

3) Is there a good Visual Project Template, that illustrates doing what I need to do.

I might eventually use some of the add on libraries, but for now, I'd just like to keep it simple. I also realize that using something like Panda, or other adons, might actually make it easier. But the learning curve might be longer.

I just need to be pointed in the right direction, for my initial tasks.

Thanks
Mike

There are a few choices for manipulating Excel:
1. Using C# with the Microsoft.Office.Interop.Excel Namespace | Microsoft Docs[^]. Google will find you some good examples.
2. C# and OleDb which allows you to treat Excel files as SQL databases. See Working with MS Excel(xls / xlsx) Using MDAC and Oledb[^].
3. Using Python and the A Guide to Excel Spreadsheets in Python With openpyxl – Real Python[^] library.

But whichever you choose you still have to learn the basic languages first. Visual Studio is merely a glorified editor and can only help when you have done the hard work of designing and coding your application.

Thanks, Yeah, I realize that I need to know how to use whatever language I use. I'm pretty good at reverse engineering, so that's why I wanted some good examples. Mainly, at this point, I'm just trying to use Visual Studio. I'm not familar with this version. And I don't want to complicate the process by having to create and debug code to use it. Although, that might be useful. This way, if I can open a project, that is known good, all my effort will be in getting Visual Studio to run it.

I tried one of the VB Exapmple Excel projects, for instance, and right off the bat I see it's targeted to something wrong. It says it can't find the NET framework 3.5, and also one way to solve it is to install the SDK targeting package. Whatever that means. Smile | :)

So now I'll begin to track that down. If I can successfully get this configured to run one of the examples, that is my goal accomplished for this part.

Thanks

Articles are generally for guidance, and will often be out of date with reference to the latest framework and version of Visual Studio. So unless you really understand what you are doing you are going to face some difficult problems.

Thanks,

I grabbed one from that list and opened with Visual Studio and it gave me a NET framework error. I installed the 3.5 version that it was missing and the program continued and opened and asked me for information to create the Excel spreadsheet. This is exactly what I was looking for.

Now, I'll look around on this and then when I understand the use of the IDE, I'll go find a more complex example.

Thanks

Excel has it's own programming language; it's called VBA.

You start by proving to yourself why you have to use C# instead.

I would "program" my Excel sheets (in VBA) to generate SQL table definitions, for example.

"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I

I understand that. I've been using Excel from version 1 (I think). But I'm not trying to learn excel. I'm trying to learn Visual Studio. It's rather obtuse, as far as I'm concerned. first started using MSVC decades ago, with Version 6. Haven't used it since.

So, I have a few goals (not one is learning Excel). I work in a test lab. Be have existing tools that spit out Excel files, when tests are completed. We then post process those. Most of the software where I work, use Visual Studio and C#.

So, I would like to learn more about C#. And get back to knowing how to use Visual Studio. That's why I was looking for good examples of using Excel, in Visual Studio. That way I have files that should be easy to open in Visual Studio.

My first task will be to be able to open one of the examples, then correct all the build errors (Net framework mismatch for instance), then execute one of the examples. At that point, then I can begin looking at the code for working with Excel files.

Member 13159493 wrote:
looking for good examples of using Excel, in Visual Studio.

You need to understand that it is not Visual Studio that you use to open Excel files, but .NET and one of its languages, e.g. C# or VB.NET. And Visual Studio is used as a support tool for .NET, not the other way round.