Contents
Introduction
Static typing
is a great help in keeping your code bug free and maintainable. Take for example:
public int Mymethod(Person person)
{
...
}
A few good things are happening here:
- Documentation: You know right away that this method takes a Person and returns an integer.
- Machine Checking: The compiler has been told as well. That means that this is not just documentation that can get out of date.
The compiler actually makes sure that what you're reading here is true.
- Tooling: Finally, Visual Studio has been told too - enabling you to quickly find out how Person is defined.
The problem is that out of the box, C# only gives you types based on the physical representation of your data in computer memory.
Integers are 32 bit numbers, strings are collections of characters, etc. So the compiler won't even give you warning
when you wind up with this:
double d = GetDistance();
double t = GetTemperature();
... Many complicated lines further ...
double probablyWrong = d + t;
Ok, you could use better naming here. totalDistance instead of d. surfaceTemperature instead of t.
But the compiler still isn't going to warn you, because it still doesn't know that totalDistance is a distance, not just a double.
Another example:
public void SendEmail(string emailAddress, string message)
{
}
The problem is that we're telling the compiler that the method can take
any string as the email address, while actually it can only take a valid email address, which is very different.
The solution to these issues is to inform the compiler about the various value types in our domain - distances, temperatures, email addresses, etc., even
if they could be represented in memory by some built in type such as double or integer.
That way, it can
catch more bugs for us. This is where semantic typing comes in.
Semantic Types
Imagine that C# included a type EmailAddress that can only contain a valid email address:
var validEmailAddress = new EmailAddress("kjones@megacorp.com");
var validEmailAddress2 = new EmailAddress("not a valid email address");
Now we can guarantee that we only pass valid email addresses to the SendEmail method:
public void SendEmail(EmailAddress emailAddress, string message)
{
}
...
SendEmail(validEmailAddress, "message");
To prevent needless exception handling, we need a static IsValid method
that checks whether an email address is valid:
bool isValidEmailAddress = EmailAddress.IsValid("kjones@megacorp.com");
bool isValidEmailAddress2 = EmailAddress.IsValid("not a valid email address");
Finally, we need a Value property to retrieve the underlying string value.
This is read-only, to ensure that after the EmailAddress has been created, it is
immutable (cannot be changed).
var validEmailAddress = new EmailAddress("kjones@megacorp.com");
string emailAddressString = validEmailAddress.Value;
Such an EmailAddress type is an example of a semantic type:
- Type based on meaning, not on physical storage:
An EmailAddress is physically still a string.
What makes it different is the way we think of that string - as an email address, not as a random collection of characters.
- Type safe:
Having a distinct EmailAddress type enables the
compiler to ensure you're not using some common string where a valid email address is expected -
just as the compiler stops you from using a string where an integer is expected.
- Guaranteed to be valid:
Because you can't create an EmailAddress based on an invalid email address,
and you can't change it after it has been created,
you know for sure that every EmaillAddress represents a valid email address.
- Documentation:
When you see a parameter of type EmailAddress, you know right away it contain an email address,
even if the parameter name is unclear.
Besides an EmailAddress type, you could have a ZipCode type, a PhoneNumber type, a Distance type, a Temperature type, etc.
Semantic typing is obviously useful, but many people do not use this approach because they fear that introducing
semantic types involves lots of typing and boilerplate.
The rest of this article shows first how to implement a semantic type, and then how to factor out all the common
code to make creating a new semantic type nice and quick.
Creating a semantic type, first take
Before seeing how to create semantic types in general, lets create a specific semantic type: EmailAddress.
Seeing that an EmailAddress is physically a string, you might be tempted to inherit from string:
public class EmailAddress: string
{
}
However, this doesn't compile, because string is
sealed, so you cannot derive from it. The same goes for int, double, etc.
You can't even inherit from DateTime.
So, we'll store the string value inside the EmailAddress class. Note that the setter is private. That way,
code outside the class cannot change the value:
public class EmailAddress
{
public string Value { get; private set; }
}
Add a static IsValid method that returns true if the given string is a valid email address:
using System.Text.RegularExpressions;
public class EmailAddress
{
public string Value { get; private set; }
public static bool IsValid(string emailAddress)
{
return Regex.IsMatch(emailAddress,
@"^(?("")("".+?(?<!\\)""@)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])@))" +
@"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$",
RegexOptions.IgnoreCase);
}
}
Add the constructor. This takes a string with hopefully a valid email address.
If it isn't an email address, throw an exception.
using System.Text.RegularExpressions;
public class EmailAddress
{
public string Value { get; private set; }
public static bool IsValid(string emailAddress)
{
return Regex.IsMatch(emailAddress,
@"^(?("")("".+?(?<!\\)""@)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])@))" +
@"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$",
RegexOptions.IgnoreCase);
}
public EmailAddress(string emailAddress)
{
if (!IsValid(emailAddress)) { throw new ArgumentException(string.Format("Invalid email address: {0}", emailAddress)); }
Value = emailAddress;
}
}
That gives us the basics. Note that with this implementation, an EmailAddress cannot be changed after it has been created - it is immutable.
If you want a new email address, you have to create a new EmailAddress object - and the constructor will ensure that your new email address is valid as well.
However, there is one more thing to implement: equality. When you use simple strings to store email addresses,
you expect to be able to compare them by value:
string emailAddress1 = "kjones@megacorp.com";
string emailAddress2 = "kjones@megacorp.com";
bool equal = (emailAddress1 == emailAddress2);
Because of this, we'll want the same behaviour with EmailAddresses:
var emailAddress1 = new EmailAddress("kjones@megacorp.com");
var emailAddress2 = new EmailAddress("kjones@megacorp.com");
bool equal = (emailAddress1 == emailAddress2);
Because EmailAddress is a
reference type,
by default the equality operator only checks whether the two EmailAddresses are physically the same.
However, we want to compare the underlying email adresses.
To make this happen, we have to implement the System.IEquatable<T> interface and override the
Object.Equals and Object.GetHashCode methods and the == and != operators
(full details).
The result is this:
public class EmailAddress : IEquatable<EmailAddress>
{
public string Value { get; private set; }
public EmailAddress(string emailAddress)
{
if (!IsValid(emailAddress)) { throw new ArgumentException(string.Format("Invalid email address: {0}", emailAddress)); }
Value = emailAddress;
}
public static bool IsValid(string emailAddress)
{
return Regex.IsMatch(emailAddress,
@"^(?("")("".+?(?<!\\)""@)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])@))" +
@"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$",
RegexOptions.IgnoreCase);
}
#region equality
public override bool Equals(Object obj)
{
if ((obj == null) || (!(obj is EmailAddress)))
{
return false;
}
return (Value.Equals(((EmailAddress)obj).Value));
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
public bool Equals(EmailAddress other)
{
if (other == null) { return false; }
return (Value.Equals(other.Value));
}
public static bool operator ==(EmailAddress a, EmailAddress b)
{
if (System.Object.ReferenceEquals(a, b))
{
return true;
}
if (((object)a == null) || ((object)b == null))
{
return true;
}
return a.Equals(b);
}
public static bool operator !=(EmailAddress a, EmailAddress b)
{
return !(a == b);
}
#endregion
}
Factoring out the boilerplate
Obviously, the EmailAddress class as it stands has lots of boilerplate that is not specific to email addresses.
We'll factor this out into a base class SemanticType. This can then be used to quickly define lots of semantic types.
Here is what EmailAddress will look like once we're done:
public class EmailAddress : SemanticType<string>
{
public static bool IsValid(string value)
{
return (Regex.IsMatch(value,
@"^(?("")("".+?(?<!\\)""@)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])@))" +
@"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$",
RegexOptions.IgnoreCase));
}
public EmailAddress(string emailAddress) : base(IsValid, emailAddress) { }
}
Here we only specify what is EmailAddress specific, leaving the boilerplate to a base class SemanticType
(which we'll get to in the next section):
- The SemanticType base class will be storing the underlying value, so it needs to be
generic,
and have a type parameter with the type of the underlying value
- in this case string.
- The IsValid method is specific to EmailAddress, so it cannot be factored out.
- It is the SemanticType constructor that stores the value, so it needs to know how to validate it.
To make that happen, simply pass the IsValid method as a parameter.
If no validation is needed, pass in null.
Another example is a BirthDate semantic type. This is a DateTime, except that birth dates must be in the past
(unless you take advance bookings for a kindergarten)
and they can't be more than say
130 years
in the past (unless you store dead people's details).
public class BirthDate : SemanticType<DateTime>
{
const int maxAgeForHumans = 130;
const int daysPerYear = 365;
public static bool IsValid(DateTime birthDate)
{
TimeSpan age = DateTime.Now - birthDate;
return (age.TotalDays >= 0) && (age.TotalDays < daysPerYear * maxAgeForHumans);
}
public BirthDate(DateTime birthDate) : base(IsValid, birthDate) { }
}
Creating the SemanticType base class
Lets start with the bare bones declaration:
public class SemanticType<T>
{
}
Value property
Add the Value property that will be used to store the underlying value. Note that it is of type T, the type of the underlying value:
public class SemanticType<T>
{
public T Value { get; private set; }
}
Constructor
Now for the constructor. This acts as a gatekeeper by throwing an exception when the passed in value is invalid,
thereby ensuring that if you have a semantic type, it is always valid. Note that:
- It doesn't allow null as a value. If you did allow null, there would be confusion between a null EmailAddress and an EmailAddress that has a null value.
- It uses the IsValid
static method that was passed in via the isValidLambda parameter to do the validation.
- It uses the type of the derived class, retrieved with this.GetType(), to create a more meaningful
exception message.
public class SemanticType<T>
{
public T Value { get; private set; }
protected SemanticType(Func<T, bool> isValidLambda, T value)
{
if ((Object)value == null)
{
throw new ArgumentException(string.Format("Trying to use null as the value of a {0}", this.GetType()));
}
if ((isValidLambda != null) && !isValidLambda(value))
{
throw new ArgumentException(string.Format("Trying to set a {0} to {1} which is invalid", this.GetType(), value));
}
Value = value;
}
}
Equality related code
Now we can implement the equality related code. First override the
Equals
and
GetHashCode
methods inherited from
Object.
public class SemanticType<T>
{
public T Value { get; private set; }
protected SemanticType(Func<T, bool> isValidLambda, T value)
{
if ((Object)value == null)
{
throw new ArgumentException(string.Format("Trying to use null as the value of a {0}", this.GetType()));
}
if ((isValidLambda != null) && !isValidLambda(value))
{
throw new ArgumentException(string.Format("Trying to set a {0} to {1} which is invalid", this.GetType(), value));
}
Value = value;
}
public override bool Equals(Object obj)
{
if (obj == null || obj.GetType() != this.GetType())
{
return false;
}
return (Value.Equals(((SemanticType<T>)obj).Value));
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
}
Implement IEquatable
Now we can implement the
IEquatable
interface, by implementing its Equals method.
The difference between IEquatable.Equals and Object.Equals is that IEquatable.Equals is strongly typed.
This has the following advantages:
- You get better type checking by the compiler.
- It makes testing for equality a bit more efficient when the underlying type is a value type, such as integer, because it prevents
boxing.
public class SemanticType<T> : IEquatable<SemanticType<T>>
{
public T Value { get; private set; }
protected SemanticType(Func<T, bool> isValidLambda, T value)
{
if ((Object)value == null)
{
throw new ArgumentException(string.Format("Trying to use null as the value of a {0}", this.GetType()));
}
if ((isValidLambda != null) && !isValidLambda(value))
{
throw new ArgumentException(string.Format("Trying to set a {0} to {1} which is invalid", this.GetType(), value));
}
Value = value;
}
public override bool Equals(Object obj)
{
if (obj == null || obj.GetType() != this.GetType())
{
return false;
}
return (Value.Equals(((SemanticType<T>)obj).Value));
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
public bool Equals(SemanticType<T> other)
{
if (other == null) { return false; }
return (Value.Equals(other.Value));
}
}
== and != operators
Finally override the == and != operators:
public class SemanticType<T> : IEquatable<SemanticType<T>>
{
public T Value { get; private set; }
protected SemanticType(Func<T, bool> isValidLambda, T value)
{
if ((Object)value == null)
{
throw new ArgumentException(string.Format("Trying to use null as the value of a {0}", this.GetType()));
}
if ((isValidLambda != null) && !isValidLambda(value))
{
throw new ArgumentException(string.Format("Trying to set a {0} to {1} which is invalid", this.GetType(), value));
}
Value = value;
}
public override bool Equals(Object obj)
{
if (obj == null || obj.GetType() != this.GetType())
{
return false;
}
return (Value.Equals(((SemanticType<T>)obj).Value));
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
public bool Equals(SemanticType<T> other)
{
if (other == null) { return false; }
return (Value.Equals(other.Value));
}
public static bool operator ==(SemanticType<T> a, SemanticType<T> b)
{
if (System.Object.ReferenceEquals(a, b))
{
return true;
}
if (((object)a == null) || ((object)b == null))
{
return false;
}
return a.Equals(b);
}
public static bool operator !=(SemanticType<T> a, SemanticType<T> b)
{
return !(a == b);
}
}
ToString
ToString
is
implemented by every Object, that is, every single type in .Net, including value types such as int and double.
By default, this simply returns the name of the type. However, you'll want the string representation of the underlying value.
This isn't so useful for say EmailAddress where the underlying value is already a string, but when it is for example a DateTime, this comes in handy.
Implementing ToString is pretty trivial:
public class SemanticType<T> : IEquatable<SemanticType<T>>
{
...
public override string ToString()
{
return this.Value.ToString();
}
}
IComparable
Say you just converted your code to use EmailAddress for email addresses, rather than strings.
The issue is that strings can be ordered
with say
List<T>.Sort
(a@abc.com comes before b@abc.com, etc.) However, out of the box, you can't do this with plain objects.
The solution is that all .Net classes concerned with ordering objects check whether an object implements the
IComparable<T> interface.
To implement that interface, you have to add a
CompareTo
method that compares the object with another object of the same class.
IComparable<T> has a non generic counterpart, IComparable.
This is a hang over from the dark and long past days when there were no generics.
I decided not to support this, because it goes against the idea of using strong typing to catch bugs at compile time.
Implementing IComparable<T> in SemanticType<T> is simple - just compare the underlying values:
public class SemanticType<T> : IEquatable<SemanticType<T>>, IComparable<SemanticType<T>>
{
...
public int CompareTo(SemanticType<T> other)
{
if (other == null) { return 1; }
return this.Value.CompareTo(other.Value);
}
}
There is one problem here: this code doesn't compile. The compiler hasn't been told that type T (the type of the underlying value)
actually implements CompareTo. There are a few options to fix this:
- Check at run time whether T implements IComparable<T> using
Type.IsAssignableFrom.
If it does, cast to IComparable<T>. If it doesn't, throw an exception.
- Add a
constraint
on T to ensure it implements IComparable<T>.
Option 1 defers checking whether T implements IComparable<T> to run time, while in option 2 this is done at compile time.
Option 2 is also a bit simpler.
This makes option 2 far preferable to me:
public class SemanticType<T> : IEquatable<SemanticType<T>>, IComparable<SemanticType<T>>
where T: IComparable<T>
{
...
public int CompareTo(SemanticType<T> other)
{
if (other == null) { return 1; }
return this.Value.CompareTo(other.Value);
}
}
What about the rare cases where the the underlying value does not implement IComparable<T>?
Maybe you want to wrap some legacy type into a semantic type.
To cater for this, in the
Semantic Types Nuget package
I introduced a class
UncomparableSemanticType<T> - a version of SemanticType<T> that does not implement
IComparable<T>. If you have a look at that code, you'll find that the common bits of these classes
have been factored out to a common base class.
Because this is pretty trivial, I haven't discussed that here.
Taming the physical world
Having simple semantic types that essentially just wrap a value works well for email addresses, phone numbers and other simple bits of data.
Things get more interesting however when applying this to lengths, areas, weights and other physical units.
Us humans are inconsistent with our units
Lets go back to the bit of code we saw in the beginning:
double d = GetDistance();
double t = GetTemperature();
... Many complicated lines further ...
double probablyWrong = d + t;
We can easily introduce semantic types Distance and Temperature here, so the compiler will catch our mistake:
Distance d = GetDistance();
Temperature t = GetTemperature();
... Many complicated lines further ...
double probablyWrong = d + t;
But this code creates a new question: is that distance in meters? Kilometers? Feet? Inches? And the temperature: degrees Celcius? Fahrenheit? Kelvin?
An improvement would be to add the unit to the variable names:
Distance distanceMeters = GetDistanceInMeters();
Temperature temperatureCelcius = GetTemperatureInCelcius();
But that gets very clunky, and can easily get outdated. And what if your site has users both in the US, in Europe and the UK? You're now dealing with
feet and meters, pounds and kilograms, and possibly much more.
Even if your site is just feet right now, your marketing department is probably already eying some market where people use meters.
Going through all numeric variables and methods to make your site handle both feet and meters won't be much fun.
The problem is that you would have to keep track of the unit of each
length, weight etc. in some separate variable that can easily get out of sync. Plus you'll be writing lots of conversion methods - MetersToInches, InchesToFeet, etc.
A clear invitation for complexity, weird bugs, pain and frustration.
The solution is to stop putting meters, feet, inches, kilograms, pounds, etc. in your variables.
Instead, think simply in terms of Lengths, Weights, etc.
Remember, the length of a real world object is the same, regardless of whether you speak inches or meters.
A Length object would look like this:
public class Length
{
public double Value { get; private set; }
public Length(double value)
{
Value = value;
}
public double Feet
{
get { return Value/0.3048; }
}
public double Meters
{
get { return Value; }
}
public static Length FromFeet(double feet)
{
return new Length(feet*0.3048);
}
public static Length FromMeters(double meters)
{
return new Length(meters);
}
}
While a weight object would go like so:
public class Weight
{
public double Value { get; private set; }
public Weight(double value)
{
Value = value;
}
public double Pounds
{
get { return Value/0.45359237; }
}
public double Kilograms
{
get { return Value; }
}
public static Weight FromKilograms(double kilograms)
{
return new Weight(kilograms);
}
public static Weight FromPounds(double pounds)
{
return new Weight(pounds*0.45359237);
}
}
Now you can write:
Length userHeight = Length.FromMeters(height_entered_by_european);
Weight userWeight = Weight.FromKilograms(weight_entered_by_european);
....
Length userHeight = Length.FromFeet(height_entered_by_american);
Weight userWeight = Weight.FromPounds(weight_entered_by_american);
....
double bmi = userWeight.Kilograms / (userHeight.Meters * userHeight.Meters);
Now it is always clear whether some number of type double is supposed to be a length in meters, a weight in pounds, etc.
And there is no more worrying whether userWeight is in kilograms or pounds. If you need the weight in kilograms,
you simply retrieve it in kilograms.
If this sounds good to you, but you don't want to code lots of classes with conversions, etc., have a look at the NuGet package
Units.NET.
It has dozens of units, all with mathematical and comparison operators, ToString, etc.
A very complete package.
A length times a length is no longer a length
If you use a package such as Units.NET, you probably have the usual arithmetic operators defined on your unit classes:
public static Length operator +(Length left, Length right)
{
return new Length(left.Value + right.Value);
}
public static Length operator -(Length left, Length right)
{
return new Length(left.Value - right.Value);
}
Things get more complicated when it comes to multiplication and division.
A length of 2 meters plus another length of 3 meters is a length of 5 meters.
But a length of 2 meters times a length of 3 meters is an area of 6 square meters:
public static Area operator *(Length left, Length right)
{
return new Area(left.Value * right.Value);
}
An Area times a Length is a Volume. A Length divided by a time period is a Speed, etc.
It's up to you where you decide to draw the line here.
Conclusion
You saw how Semantic Types help you prevent bugs by getting the compiler to find them for you at compile time. They also
make it easier to understand your code by letting you specify that something is an email address, distance, temperature, etc.
rather than just some string or double.
You also saw how to create a SemanticType base class that makes it easy to create new semantic types
without getting bogged down in lots of boilerplate.