|
For me, the compiler should issue at least a warning, evtl an error ! This situation is often due to a programming error, it's not at all frequent to have a good reason to specify several times the same value
|
|
|
|
|
I got your point Vlad; interesting "discovery", i'd never imagine that would be the behavior of an Enum in C#.
However, it would not happen in any of my codes since (as you said), Enums exist to represent numbers, so i will always specify their values, otherwise there is no point and would be better to just use the integer.
So, this:
public enum Status : byte
{
Pending = 0,
Doing = 1,
Done = 2
}
Would not trigger any issues. Notice that i also declare the cast type (be int, short or byte) that would fit for.
If i could add another issue to your list would be that all Enums automatically cast from Int32. So if you declare a Enum to be cast by another number struct and you use that enum in a generic setting (as a Generic argument on a function) and cast it to Int32 it would cause a cast issue.
Like:
public int GetValue<E>(E enumVal) { return (int)enumVal; }
That code would run fine if the Enum was "inherited" (not the right term) to an Int32 instead of a Byte.
And i've not found any functions os tests on the Type class that would answer if it can be cast to an specific type (for a generic argument).
|
|
|
|
|
I wholly agree that is this not proper way to use Enums, and also I'm not advocating for it to be used as such.
This issue came to light when we had data being brought in from a 3rd party system (game consoles) which mapped data from the inner workings of the game to the application and we needed to respond in kind.
|
|
|
|
|
Think in Averages.. Average value.. the median between low and high.. I would speculate there is more in the background to this than meets the eye and that it is not a random hodge podge result - but more of a mathematical determinant.
That said however , I do not know any valid reason for using an Enumeration with duplicate integer assignments. The whole point of Enumeration is the Uniqueness of values to provide for a 1 to 1 relationship between value and String. If two options are to have the same effect then your code using the enumeration should handle the use case : case 1,3 break; case 2 break; or one should rethink what they are doing with the enumeration.
A string can be a number too but we do not use it that way, we have defined the rules of use and even code pages are rules of use for numerical representation of string values..
While the intro is a statement of uses for enumerations it does not define the point and purpose of using them.
|
|
|
|
|
an enum which contains duplicate values is used to have 2 different names which actually refer to the same thing. Some enums in windows are like this. Its not very common, but it is done. i agree there is probably a mathematical reason, or maybe an inherited reason because of the way the code is translated to clr or binary. maybe even because they followed a pattern/convention from c or c++ that yielded such.
if op had read Microsofts spec on their enum implementation on msdn, he would not have been surprised to see that it is perfectly valid to have 2 enum members with the same value.
it is also valid to cast any integer to the enum, even if there is no such value in that enum.
it is also true the enum will have a member value of 0, even if it is not specified. And the first in the enum will by default have the value 0, which is also the default for the enum. The first item should be used to represent the "none" value.
the 0 value is always present in every enum value.
i think any of these tidbits wouldve been better to write an article about, instead of about someone who didnt keep track of their values, duplicated one and then passed it around to discover someone had screwed up.
i dont see how this is a "side effect of enums" as the title states. it was something the op didnt realize because he didnt know how enums actually worked. thats not a side-effct.
modified 10-Feb-19 3:01am.
|
|
|
|
|
This is neither hidden, nor a side effect. Simply don't use duplicate values in enums. What are you actually trying to achieve?
|
|
|
|
|
Agreed. If there are similar underlying values in a Enum, then, maybe you should not be using an Enum.
To me an Enum implies an Enumeration of different states (underlying values).
Enums helps us make sense of that State, for example, bar1 is a lot easier to process and understand than just the number 1.
Thanks,
|
|
|
|
|
Kinda missed the point of the article, it's not a matter of "you should do this", it's more a matter of "this is what happens if you would, so that you don't spend hours debugging", also conside that not all code you work on has been actually approved by personal standards
|
|
|
|
|
Your tests that return specific names from a typecasted enum is in my opinion completely implementation dependant:
I think that there's no reliable way to determine which name you'll get from an enum value: this depends on how the map from enum to name is implemented.
* it may be simply array-based, ordered by enum numeric value, but then when there are multiple enum names mapped to the same numeric value, the array will contain multiple names arranged in arbitrary order in the range of enum values that have the same numeric value. In that case the name you get is the closed found from the binary search... if a binary search is used
* it may be array-based, but with a linear search of the numeric value: you'll get the first name in that array that has this value, but here also multiple names may also have the same value, so the name is unpredictable.
* it may be based on a hash table lookup: you'll get a random position in the table where there an entry mapping numeric entries to a name, the name you'll get is the first one from the collision list that contain all other names mapped with the same numeric value.
In summary: it's not portable at all to define an enum type with multiple names mapped to the same integer value. And I bet that C/C++ compilers should emit a warning if ever you attempt to define an enum type with multiple names mapped to the same integer value.
But then a typical enum declaration like:
enum {a, b, c, first=a, last=c}
will emit warnings for the duplicate mappings of the same integer value 1 for "a" and "first", and and the same integer value 3 for "c" and "last".
There's no reliable way to determine what name will be displayed even by "a >> cout", UNLESS you augment your enum type with a static method that will mapd which name (string) to return from any enum value (that method may use the implementation it wants: array-based with a binary search or linear search, or hash-based, or using arbitrary code).
If the C++/C/C# compiler automatically builds a mapping function (from your enum type to a string), you have to wonder what this mapping function does between the three approaches (possibly several apparoches may be used simultaneously, depending on the number of elements defined in your enum, or depending if they have duplicate names mapped to the same numeric value).
Your enum type must then define its static method to cast an enum value to a string.
The effect that was observed by the author of the article is what you he expect when the C++ compiler provided a default mapping method (to map an enum value to a string) using a table-based approach with binary search.
But then compare:
enum{a=1, b=1} and enum{b=1, a=1}
The mapping using a table-based array (used with a binary search or linear search) could define an internal static table of strings like this:
{ {1,"a"}, {1,"b"} } or { {1,"b"}, {1,"a"} }
or even just (removing duplicates and keeping arbitrary names, e.g. the first one defined in the source code, or the last one defined):
{ {1,"a"} } or { {1,"b"} }
This method does not need to keep duplicate names (given that the mapping cannot return multiple names, so this alternative is not useful:
{ {1,{"a","b"}} } or { {1,{"b","a"}} }
At best the mapping function could also return all possible names (space-separated or comma-separated, it does not matter), but here also in arbitrary order:
{ {1, "a b"} } or { {1, "b a"} } or
{ {1, "a,b"} } or { {1, "b,a"} }
(such mapping from integer values to one or more names will only useful in the generated debugging info and will be used by debuggers so they recognize all the defined names, but they may still display by default an arbitrary name for such case, but it won't be used by your program itself which should not depend at all on this synthetic method or data built for debuggers, or for the introspection/reflection API which will return you all the possible names for each value).
This synthetic mapping method automatically built by the compiler for you should be avoided: provide your own mapping method using the approach you want for your goals.
In my opinion, a C/C++/C# compiler should not even try to create any synthetic mapping method converting an enum to a string, if ever the enum has multiple different names defined with the same numeric value, and then at compile time (or at least at link time), it should emit an error that such conversion from enum to string is NOT defined (so the basic typecast from such enum type to string is NOT defined by your program): the only safe synthetic method which is reliable for this case is only to display the numeric value itself, and not any name defined internally by the source code of your enum.
Note: this also applies to other languages which allow defining "enum types" (e.g. in Java, but Java requires that you create your own static toString() method, if you want to get any name from enum values; these names are not necessarily those defined in your source code, for example the defined names may be language-neutral, frequently technical and abbreviated, but the actual strings needed may be translated in a user's locale, or could be full sentences; Java cannot guess what you want and will only be able to generate a string representation of the numeric value; Java will expose all the multiple names in its introspection/reflection API if you query the datatype info: the enum type is a normal "class" with static methods, and it's very usual in Java to explicitly define these static "toString()" methods for almost every class).
----
Note2: conceptually, an enum type is just a finite set of distinct names. So they have no other numeric values, and the names themselves are unordered, have no defined arithmetic, so you cannot safely increment any given enum value and get a predictable other distinct value (it's not even warrantied that incrementing it will return a different value)
Assigning integer values to the declared enum values (i.e. names) creates a mapping, i.e. an surjection function, i.e. a projection, not necesssarily bijective which allows defining an arithmetic, but this does not mean that the arithmetic is safe (not all numeric values will have a successor, so not all enum values will have a successor): you'll experiment "overflow" situations.
But such mapping allows defining an "order" between all declared enum values. But this is a total order (i.e. a relation based on "<="), not a partial order (i.e. a relation based only on "<"), because now you have also duplicates (two distinctly defined enum values can compare as being equal to each other, under the semantic created by this mapping/projection to integers).
Concpetually, without this mapping, it should be possible to define an enum type as being strictly a finite set (with all elements distinguished), optionally orderable (with a partial order based on the "<" operation), but still without any arithmetic (if it is ordered with "<" you can still determine a "first" value in that set, and a "last": an enum value is the "first" if it is not the successor of any enum value in that set, it is the last if it is not the predecessor of any enum value in that set, and the order allows defining a **bijection** to the bounded set of integers from 1 to N, inclusive, where N is the cardinality of the enum type.
But to define such bijection, you need to declare the enum with a static comparator method: this allows a mapping function from enum values to their names to be efficently implemented not just with a binary search, but by a direct table indexing method.
The enum types in C/C++/C# are not strictly sets, as you can alter the order as you want and skip numeric values, for example enum{a=1, b=1000}, creating "holes" where you cannot define a single "first" and a single "last" element, so that all elements except the single "last" one have a successor, and all elements except the single "first" one have a predecessor: this complicates a lot things, because "overflows" can occur anywhere for all basic artihmetic operations for ALL enum values.
And for this reason, you cannot assume any implementation of the mapping from enum values to strings (to return their name). The compiler has to make arbitrary choices.
So the enum types in C/C++/C# are extremely poor, and should have no other operations defined, other than comparing if they are equal or different.
Everything else is fuzzy, and your defined enum type MUST be specified precisely by defining all other operations (all conversions to any other type, all arithmetic operations, all binary operations needed for total or partial ordering). Your compiler may automatically generate some default synthetic static methods for all these operations, but you'll be surprised that this can give results you did not expect. If this is not what you want, just treat enum types as true classes, and define these static methods yourself in these classes!
This is what is required in Java where enum types can only be compared with a single equalTo, and where there's absolutely no constructor: all instances are static, created by the enum type declaration itself, and there's no way to convert/typecast them to other types, not even an integer; the only synthetic method generated is a toString method returning only the name of the defined constant, and a static values() method returning an ordered array of names for all defined constants, it will also declare a comparator that allows comparing them, and an ordinal().
To assign different integer values, you'll need to declare yourself the ordinal() and compareTo() methods; you'll probably have to declare also your own toString() method to return them as strings displaying their ordinal value, or something else, and you may optionally add static constructors (which are more exactly "factory" methods which will return one of the statically declared instances).
I think that Java (which has enum types since version 5) is much more correct here than what C/C++/C# are tolerating (in a very fuzzy way) and supporting with their ill-defined default synthetic methods for all operations.
modified 16-Nov-18 20:05pm.
|
|
|
|
|
I see no reason why C/C++ should emit a warning if multiple names mapped to the same value. Having aliases (either for backward compatibility or just "person friendly" names along side complete naming, e.g. color.gray10 ... color.gray90 and color.lightGray) seems common enough.
But I do agree that a compiler/runtime that automatically creates value to string mappings should generate an error if resolving a value with multiple names. In C, this isn't an issue since there is no automatic resolver. Given all the STL stuff added to C++ over the decades, the same assumption can't be made, depending on the std used. I would consider this a C# design flaw on MS's part.
|
|
|
|
|
One way to define "safe" enums with two names for the same value, would just be to define them as
enum{a, b, c; const first = a, last = c}
which defines only "a", "b", "c" as "canonical" values (that have predictable names), and defining "first","last" only as aliases (whose name will not be returned when querying names from an enum alue that can only return "a", "b", or "c") which share the same equal value (i.e. the same ordinal); here the semicolon instead of the comma, or the const keyword is enough to say that we are defining an alias.
This allows defining a safe arithmetic restricted only in {a,b,c} (whose result is restricted to that unique set or will cause a predictable overflow exception, for example a-1 or c+1 would unconditionally overflow, and a+1 would still give b, and b+1 would still give c even if it is also equal to the alias named "last").
But assigning random numeric values to constants declared in an enum causes various problems: we cannot safely define a "first" and "last" element, and not easily define an ordinal if we allow the declared enum constants to create "holes" between elements of the ordered set (i.e. they are assigned in non consecutive ranges), and so we cannot safely define any arithmetic on them as all members of these sets can overflow.
You can only define a *single* numeric constraint on one of the defined constants (for example the first one can be set to 0 or 1 or 1000, it does not matter, all other members are assigned to create a unique sequence of consecutive integers). So this declaration is safe:
enum{a=1000, b, c}
But not this one, even if there's no pair of defined constants that are given the same integer value:
enum{a=1, b=10, c}
because (a+1) is not part of the set but a is not the highest value (i.e. not the last one), and because (b-1) is also not part of the set but is also not the smallest value (i.e. not the first one).
A compiler however may infer default names such as "a", "(a+1)" (or just "2" in that last example) for the undefined value a+1, and so on up to "(a+9)" (or just "9"), just before "c"; it won't cause any overflow exception, and the defined set above would actually contain 11 distinct constants each one with a distinct name as well.
And then we could define valid restricted integer types like:
enum{min=-100, max=100}
containing 201 constants from -100 to 100 inclusively (using a modulo 201 arithmetic not requiring any overflow checks). So we could define:
typedef enum{min=-128, max=127} int8_t;
(such definition defining a strict set with 256 distinct values would precisely perform an arithmetic modulo 256); or:
typedef enum{zero=0, max=9} decimaldigit_t;
(such definition defining a strict set with 10 distinct values would precisely perform an arithmetic modulo 10, whose constants are named "zero", "(zero+1)", ..., "(zero+8)", "max", or just named "zero", "1", ... "8", "max": these names can safely be returned by a synthetic default static method generated by the compiler, that converts an enum value to a string showing the canonical names of defined constants, necessarily starting with a letter or underscose, or otherwise showing just their numeric values if no name is defined for other constants that are also part of the defined enum set).
Whever the compiler will generate an unchecked "modulo N" arithmetic or a checked "bounded" arithmetic could also be an option for the defined enum type so that:
typedef enum {
zero=0, max=9
} catch(i) {
throw(new Error("decimal digit overflow %d", i));
} strictdecimaldigit_t;
would throw overflow exceptions if the result of an arithmetic causes out-of-range values, but the default "modulo N" arithmetic could be also changed, for example to add a carry:
typedef enum {
min=0, max=9;
} catch(int i) {
const int N = max - min + 1;
return (i - min) / N + (i - min) % N + min; // this value is checked again by the catcher!
} carryingdecimaldigit_t;
Note also that instead of defining this catcher, you may want to define a constructor (from an integer type) for the enum type. But the semantic is a bit different and both mayt be used simultaneously in the definition of the enum type:
- If there's a constructor, the enum value returned by the constructor will be used, otherwise if there's a catcher defined, it will be used (see below), otherwise a default synthetic "modulo N" method will be used.
- When the constructor returns a value, its value is not returned immediately as is: if there's a catcher defined for the type, then the value is checked and if it falls out of range, then the catcher is invoked to fix it.
- When a catcher is invoked, its integer return value will be used to invoke the constructor if there's one (see above), otherwise it will be fixed by the default synthetic "modulo N" catcher (in current implementations of enum types in C/C++/C#, this default synthetic "modulo N" catcher uses a value of N which is some power of 2 not clearly defined (but usually it is 2^8 if the enum type is represented as a byte, so the value range is not completely restricted to the strict range going from the minimum to the maximum values defined in the enum, but to a wider unspecified range).
The value of N is just sufficient to hold all the declared numeric values distinctly, but not minimal (when you declare for example an enum{a,b,c} with 3 distinct values, the compiler may use N=256 instead of N=3); this however allows faster code because the "modulo N" checker actually does not generate any code at runtime, the compiler just silently truncates some unnecessary bits when storing values, without performing any actual check, so the results are Ok to preserve distinction, but not good enough to create a safe arithmetic (this makes impossible to define a safe "enumerator" to iterate over all constants actually defined in the enum type, and a "switch(enumvalue)" in the code of the enumerator-based loop should always include a "default" after listing cases only on defined enum constants, to handle other undefined/anonymous constants that are part of the declared enum type).
The compiler should check that the code handles these omitted cases properly, signaling missing "default" in "switch" (even if the arbitrarily chosen value N is minimal, for example in enum{a,b,c,d} and the compiler chooses N=4, storing only 2 bits per value, because other compilers may as well choose N=256, storing 8 bits per value).
Enum types should also modify the integer type promotion rules in expressions, for example:
- (-enum) or (+enum) returns a value of the same enum type (first, the enum is promoted to an int, then the expression is evaluated, then the value is passed through the declared enum constructor, and its declared "catcher")
- (enum + int) returns a value of the same enum type (same algorithm)
- (int + enum) silently promotes the enum to an int and evaluates the expression as an int without using any constructor or catcher.
- (enum >> int) returns a value of the same enum type.
Ideally the same promotion rules should be used between other distinct numeric types (char, short, int, long, long long, float, double, long double and signed variants) using inference on the left-most operand, so that:
- (int + long) is an int
- (long + int) is a long
- (int >> long) is an int
- (long >> int) is a long
- (int + float) is an int
- (float + int) is a float
- (char + unsigned char) is a char
- (unsigned char + char) is an unsigned char
- and so on...
This also means that binary arithmetic operators must NOT be commutative, when operands are not the same integer type, all would be strictly driven by the type of the left-most operand; but this would change the existing promotion rules in C/C++ for basic numeric types; this also changes the associativity and then requires a precise evaluation order, so that "a+b+c" must be evaluated only as ((a+b)+c) but not as (a+(b+c)): this associativity is possible only if operands are the same numeric type (i.e. with the same declared range and precision for its values).
And then we could as well define an enum type for non-integers (here based on declaration of numeric "double" or float values:
- typedef enum{min=0.0, max=1.0} drate_t;
- typedef enum{min=0.0f, max=1.0f} frate_t;
The following would be either invalid, or would promote the numeric values to the same numeric type:
- typedef enum{min=0, max=1.0} drate_t; // same as before: 0 is promoted to 0.0
- typedef enum{min=0, max=1.0f} frate_t; // same as before: 0 is promoted to 0.0f
Another interesting declaration:
typedef enum {
min = 0, max = 100.0f; const pi = 3.14f
} catch (int i) {
return (i <= min) ? min : (i >= max) ? max : (float)i;
} catch (float f) {
return (f <= min) ? min : (f >= max) ? max : (float)math.floor(f * 10.0f + 0.5f) / 10.0f;
} estimate_t;
This last declaration defines a strict enum type with exactly 1001 distinct numeric values {0.0f, 0.1f, 0.2f, ..., 99.9f, 100.0f} which are "capped" between min and max (no modulo N) and rounded.
The declaration of the "estimate_t::pi" constant (as an alias, not as an additional value of the set) actually gives it exactly the numeric value 3.1f (assigning numeric values to declared constants passses them through the declared constructor if there's one, or throught the declared "catchers", both of which enforcing the arithmeitc rules.
----
Another interesting case:
typedef enum {'A', 'Z'} capital_t;
This would also be a valid declaration: you are not required to name distinctly the constants that are part of the declared numeric type. All that is enough is that any variable declared with that enum type (which is based on any basic numeric type of the language), must be able to store distinctly all the constants between the lower bound and upper bound of constants declared in the enum. Here it would declare a type large enughand precise enough to hold one of the 26 constants between 'A' and 'Z' inclusively.
So as well the declarations below would be valid:
typedef enum {'A', 'Z', 'A'} capital_t;
typedef enum {
char::min, char::max,
(unsigned char)::min, (unsigned char)::max,
(signed char)::min, (signed char)::max
} anychar_t;
The declared constant values don't need to be unique; the compiler determines itself the lower and upper bounds of the type, and the minimum precision needed to store the relevant differences and allocates enough bits, determining itself the basic numeric type to use for the values; and no constant need to be named explicitly.
Ideally however the compiler should automatically declare two constant names for the bounds, such as __min and __max, and possibly the cardinality of the set, such as __prec for the minimum precision (given in one of the basic numeric types, including long long or long double) as the estimate of the base-2 logarithm of the number of distinct values between these bounds, and __size (or just sizeof) for the actual precision stored(these precisions will be given in bits so that __prec <= __size, and (2^__size) is the value of "N" for the default ''modulo N'' catcher synthetically generated).
So for
typedef enum {'A', 'Z', 'A'} capital_t;,
we would have:
capital_t::__min == 'A' (which is a constant part of the declared type),
capital_t::__max == 'Z' (which is a constant part of the declared type),
__prec<capital_t> == math.log2(__max - __min + 1) (which is a constant in a basic floating point numeric type, roughly equal to 4.75488750216 here)
__size(capital_t) == 5, (which is a constant in a basic integer numeric type);
sizeof(capital_t) == 1 (assuming that a single "char" can hold all 5 bits needed to store distinct constants from 'A' to 'Z' and that sizeof(char) == 1 which generally means at least 8 bits;
As well we would have:
anychar_t::__min == (signed char)::min (which is a constant part of the declared type, generally -128),
anychar_t::__max == (unsigned char)::max (which is a constant part of the declared type, generally 255),
__prec<anychar_t> == math.log2(__max - __min + 1) (which is a constant in a basic floating point numeric type, generally roughly equal to 8.5849625007211561 here)
__size<anychar_t> == 9, (which is a constant in a basic integer numeric type, but this could be equal to 16 instead of 9);
sizeof(anychar_t) == 2 (assuming that sizeof(char)=1)
Note that __prec is given as a logarithm instead of giving the real __cardinality directly (the value of ''N'' described above, because the cardinality of the set may not not expressible for all numetic types such as "long long" and "long double", as a constant of one of the basic numeric types, without causing an overflow (notably for "long double" where __prec=80, and ''N'' could be N=2^80 and its inverse exceeds the actual epsilon separating non-infinite and non-NaN values).
Other numeric type properties could also be infered as additional constants (not values in the declared type itself), such as the number of distinct NaN values, the number of distinct infinite values, the number of distinct zero values, the number of distinct denormal values, and a type constant giving the infered native numeric type:
__type<anychar_t> == short (if __size<anychar_t> == 9 or 16)
__type<enum{'a','z'}> == char (if __size<enum{'a','z'}> == 8)
Also __prec is not determining directly the step that allows enumerating all distinct values in the defined type (e.g. for integers you can enumerate them by adding 1, but for floatting points the additive step depends on the magnitude of each enumerated value, and there are special steps to enumerate negative and positive zeroes, or denormal values, or signaling and non-signaling NaNs, or positive and negative infinite values).
Also for this reason, the compiler should also automatically declare default forward and default backward enumerators for the declared enum type, which you can instanciate from any enum value and then call once to get the previous and next distinct value.
With all these, we no longer need any preprocessor defines to know the limits of any type (not even native numeric types). All numeric types, including native ones are declared explicitly as enum types in <stdtype>, so macros defined in <limits.h> are deprecated.
We can also view enums like a typesafe version of unions and also allow declaring an enum like this:
typedef enum{(value1), (value2), (value3), (value4)} generictype_t;
The idea is here not to define constants, but create a type that can hold any sample values listed and that are comparable (so that we can define an full order between them and know if they are equal), without having to list all possible values. For example:
typedef enum{100, 200, "x", "y"} generictype_t;
are theoretically possible to create a type storing integers or strings if we also have a full order between them: here this is a type that will include either integers between 100 and 200 (these can only be these two), or strings between "x" and "y" (so including also "x0", "x1", "x11", "xy", "xyz"...) (note: the set has no cardinality, we now the number of possible integers, but not the number of strings, we can only know the number of possible distinct pointers/references according to limits of pointers, i.e. the pointer size in bits.)
The compiler will automatically infer a distinctive tag value when necessary and store that tag value if there's more than one tag. In this example, a tag=0 will be used for integers (0 to 1) and tag=1 will be used for strings (between "x" and "y").
It will generate the set of tags automatically using synthetic constructors like this:
generictype_t(int) : tag(0) {};
generictype_t(string) : tag(1) {};
(these constructors only specify the distinctive tag, not the value which is assigned automatically).
You assign a declared variable of that type normally, without having to specify the tag:
generictype_t x = 102;
and you can then query the tag of any value in that typed variable:
int t = tag<x>; (sets t = 0)
For this the declared type automatically builds a synthetic static method for that enum type...
Then you can specify also tags explicitly in the declaration of the enum (if two distinct values declared in the set have the same tag, they will be stored as an union and no way to distinguish them by the tag, only by their distinct value:
typedef enum{ 100: 0, 200: 0, 300: 1, 400: 2} t;
(this enum contains values between 100 to 200, or equal to 300, or to 400, in three subsets with tags 0, 1, or 2)
Each subset, i.e. each distinct tag value, has its own minimum and maximum bounds, its own size in bits, its own cardinality. The compiler has now 3 declared tags, and the set of tags is also an enum type declared implicitly.
This allows replacing unsafe type declarations like:
typedef enum {int_tag, double_tag, string_tag} tag_t;
typedef struct {
tag_t tag;
union {
int int_val;
double double_val;
string string_val;
}
} variant_t;
by:
typedef enum {<int>, 10, <double>, <string>} variant_t;
(the tag values are assigned automatically by the compiler; tag=0 for int values, tag=1 for double values, tag=2 for string values; here instead of specifying examplare constant values of each type, we just cite their typename between angle brackets, but even if we add examplar values like 10 in this example, as it matches the <int> type also declared, it does not add another tag value and the compiler can discard it; the order of declaration of members of the enum is significant if they are different types).
We could also declare the tag values ourself:
typedef enum {<int>: 'I', <double>: 'D' , <string>: 'S'} variant_t;
and the distinctive tag values will have a char datatype. The enum declaration does not create a new type for tags; if needed types for tag values can be declared separately:
typedef enum {'I', 'D', 'S'} tag_t;
typedef enum: tag_t {<int>, <double>, <string>} variant_t;
(here the compiler assign tags with values taken by enumerating the given "tag_t" type, instead of enumerating "int" by default).
or by using declared constant names given in the tag type:
typedef enum {Int: 'I', Double: 'D', String: 'S'} tag_t;
typedef enum {<int>: (tag_t::Int), <double>: (tag_t::Double), <string>: (tag_t::String)} variant_t;
Here the tags are given a char datatype, but it could also be a string, giving its distinctive name or description:
typedef enum {<int>: "this is an integer", <double>: "this is a floatting point number", <string>: "this is a name"} variant_t;
When the tag type given is a string, it can be used by the synthetic default toString() method when showing the actual value like this:
variant_t::toString() {
return new string( tag<*this>, ':', value<*this>.toString() );
}
We an also select one of the tag subtypes:
variant_t<int> (because <int> is a member type declared in this enum)
to create explicit type conversion (typecast) of the value with some defined method if needed (the effect of that explicit method will be to generate a new enum value with the new tag value, for example converting an enum value with value type <double> into another enum value of the same enum type but with value type <int>).
The compiler can make lot of typesafe inference and generate the optimal storage, reducing the number of bits needed for storing each tag (or not storing it at all if the declared enum has only one tag) if we don't specify ourself a specific type for the tag. In all cases, it will build itself the synthetic code for the static property tag<variant_t>...
No more need of any unsafe unions, including with complex datatypes within unions, no more need to name each member of the union, type inference determines the correct member and sets the tag value properly and implicitly when we set the actual value of an enum variable !
We can even imagine a language that predefines absolutely NO native datatype, all datatypes being declared by an enum declaration (starting by definining them '''only''' with constants supported by the language parser, like: false, true, nil, 10, 3.14, 1.23e45, 'A', "AAAA"...
We can imagine also some new kinds of "tagged constants" recognized by the parser like: 0t12.2'i' == 0t12.2('h'+1) to represent a imaginary number represented by a constant <double> value tagged by a <char> value, this constant having data type "double<'I'>" here, itself a subtype of "double<char>", or 0t0x0a'i' == 0t10'i' which is a constant of type "int<'i'>", itself a subtype of "int<char>" ...).
Another alternative but equivalent syntax for tagged constants would be <'i'>12.2 == <'h'+1>12.2. Untagged constants like 12.2 are equivalent to <0>12.2 (the default tag constant is 0, the default tag type is an int, enumerated by default by a forward iterator starting from 0 with increment 1; this default enumerator is used when enum members are declared without a tag and a new tag is needed because they are not the same base type):
- enum{ a, b, c } is then equivalent to enum:int{<0>a, <0>b, <0>c} (these <0> tags don't need to be stored, they are implicit, not significant)
- enum{ <int>, <double>, <string> } is then equivalent to enum:int{ <0><int>, <1><double>, <2><string> } (these 3 distinct <0>, <1>, <2> tags are needed because we use types, and not constants, as members of the declared enum, even if every <int> instance can be compared as equal to an existing <double>, something that cannot be asserted for all of them, for example when <int> requires a 64-bit value, and <double> also requires 64-bit but not for the same precision and value range, so each <int> member will be stored differently from each <double> member, and a distinct tag value is needed; here also the order of declaration of members in the enum type is significant when value types are different; here also you can have constructors for the enum type, as well as catchers...). Here also the tag value can be a constant expression.
Every native numeric type, every objects like strings or arrays, or structs, classes, pointers, references, functions/methods can also be type members of an enum, and be used as a tag type. All native types can be declared in the language itself (so no more need to reserve keywords like "bool", "char", "short", "int", "long", "float", "double": they can be all declared using a typedef as an enum (or enum + catches), with their minimum, maximum, precision, rounding modes, and other predefined named constants of these types (with these constant names scoped in their defining type)... We have now a fully defined semantics for all arithmetic operations, orderings, comparations. The preprocessor is no longer required at all (except possibly for #include, which may instead be better replaced by "require(package)").
----
Being able to define a type-safe arithmetic for enum types (at least the arithmetic giving the successor, i.e. constant+1), allows defining useful objects, notably iterators (that we could really name "enumerators") over the value range of enum types, which would in turn permit object-oriented constructs like:
for (i: enumerator<enum_t>) { ... }
which won't forget to handle any possible value of an enum type (so won't generate bugs at runtime, like those occuring when using switch statements only with a missing "default:" selector: the compiler would know that the "default:" is missing acording to the type of "i").
And the following declaration is also unsafe, if "a" is assigned by the compiler the integer value 0 (without taking into account the single constraint given to "b" which could instead be used by the compiler to assert that a=-1 and c=1):
enum{a, b=0, c}
These tricks inherited in C# from C and C++ are really bad, these generate unchecked conditions and unexpected bugs with possible overflows, silently generating values that are not part of the defined set.
For now, the only interest of enum types in C/C++/C# is not to restrict the set of values for strict type safety, but just:
- to define constants with possibly scoped names (qualifiable with the typename) and not depending on a preprocessor (whose scoping rules are only global, and severely depend on #included source reading order).
- to use the appropriate integer type (with the minimum bit-size) to store a single enum value.
But I consider that bitfields (using notations like ":1" in C/C++ declarations of structures) are much safer: at least we know precisely their value range, there's no aliases at all, and the arithmetic is precisely defined.
modified 17-Nov-18 17:33pm.
|
|
|
|
|
Very thought provoking. I just learned about enum and I appreciate the insight into possible issues. As a database person, I would want to put these lists into a database with multiple keys, one for identifying, and one for list order. This way changes could be made in the database without affecting the code. Probably less efficient though.
|
|
|
|
|
It is an interesting educational exercise but I don't really see any problem in practice. The syntax in declaring a new enum is so small that this problem would be glaringly obvious, assuming that you actually change the default numeric values in the first place.
|
|
|
|
|
|
If you dig into the Enum source code, you'll find that it does a binary search of the enum values in the GetEnumName method. GetEnumName is (eventually) called when calling ToString on an enum that does not have the Flags attribute
|
|
|
|
|
Nice to know but does it really matter?
As shown here
https://dotnetfiddle.net/2ro2BG
If Foo.Bar1 =2 and Foo.Bar2=2 then they are for all intents and purposes the same enum.
Though id be more tempted to define Bar2 as Bar2 = Bar1 instead of both to 2 to remove any code comprehension confusion. Either way any test for variable = Foo.Bar1 will also pass if the value was Foo.Bar2 as its comparing the underlying value (perhaps you want to fix a typo without just removing the bad enum and breaking other dependant code).
So I don't see it as an issue.
The real issue i have with enums is you cant do strings or objects, that would help to eliminate passing magic strings around (or case/swicth blocks or reflection to get attribute values) to convert enums to strings/objects while providing a nicer untellisense experience
|
|
|
|
|
Interesting article and well-written.
|
|
|
|
|
The simplest solution might be to re-evaluate why your enumeration has multiple identical values in the first place. Perhaps an enum is the wrong solution.
However, in the event that this somehow happens accidentally, it is pretty critical to understand that this is neither a compile-time nor a run-time error, and to understand what will actually happen. Encountering this as a bug would probably vex me for over an hour trying to figure out why I'm always getting the wrong enum value.
|
|
|
|
|
This is certainly causing unexpected bugs when you use them with generic types (with templates) and you make assumptions that enum values are all distinguished by a fixed number of possible names (notably when converting enums to strings and then the reverse: this is not a bijection).
This can severely impact the validity of generic algorithms (template functions or methods) with enum types in template parameters.
|
|
|
|
|
The answer is that the inversion of an enum requires a search of the key value pairs for the value. My question is: what search algo would produce this? Looks like a form of binary search to me...
h
|
|
|
|
|
An enum shouldn't have duplicate values. Just because it's possible doesn't mean you should.
|
|
|
|
|
It shouldn't even compile this way...but it does
|
|
|
|
|
it should compile this way. and so it does.
|
|
|
|
|
I also agree; dupe values may be useful to help defining other objects, but only as a helper to mark some of the enum values with specific properties.
This means that if you need dupes, then enum definitions are most probably missing the possibility of adding some tagging qualifiers to some of their members: this is a syntaxic problem incorrectly solved by breaking the basic rule that each enum value should be uniquely identifiable.
In this case (as in the article, we need a way to distinguish which name is "canonical" for the enum definition). But the way C# compiles these cases is very intrigatring because it randomly chooses the names (not the first one defined, not the last one, not even always the one in the middle, and the choice also depends on other names defined for other values...)
This is clearly inconsistant, and a bug of existing C# implementations (created by insufficient specification of the language): the existing synthetic mapping from values to names is unpredictable, and non-portable, at can only cause problems at any time, and it is already broken; it won't cost any more compatibility problem if this is really fixed by specifying the behavior correctly.
One way would be be add some declarator tags to canonical names (that must be unique per assigned value), or by adding a declarator tag to the enum type itself (to enforce the uniqueness): this way the synthetic generation of the mapping from value to strings is necessarily predictable and valid, and all the other legacy uses will generate at least warnings (compiling in "legacy" mode), or errors directly (compiling in "strict" mode and if this enum type does not define explicitly itself its "tostring()" static method which will have the effect of forbidding the compiler to generate a synthetic method and its companion static table).
There should be a cleaner way to define additional non canonical aliases for enums, but enum types should already have predefined properties (at least minvalue, maxvalue, and basetype) that we shouldn't need to declare but that we can use directly instead of defining specific aliases.
|
|
|
|
|
While I probably would never assign the same ordinal value to an enum, under certain conditions, such as working with hardware where an I/O pin might represent either an address bit or a data bit (particularly found on small microprocessors that share 8 bits for both high/low byte addressing and data) I might do that, but even then it would probably be separate enums...
...anyways, good to know, particularly with regards to serialization / deserialization of the ordinal value.
BTW, I like these small vignettes, keep on posting!
Latest Article - A Concise Overview of Threads
Learning to code with python is like learning to swim with those little arm floaties. It gives you undeserved confidence and will eventually drown you. - DangerBunny
Artificial intelligence is the only remedy for natural stupidity. - CDP1802
|
|
|
|
|