One equality operator to rule them all
Two kinds of equality
In a lot of programming languages there are two different concepts for equality: Reference equality (also called Identity) and Value equality.
Reference equality is the same as equality of memory addresses. Two objects are the same if they are the same instance (iif they occupy the same address in memory).
Value equality is semantic equality. Two objects are equal if the values in them mean the same thing. For example: two instances of IPAddress that represent the address “127.0.0.1” are equal even if they have been instantiated in different places.
My plan with this article is to convince you that there should only be one equality operator. Let me make my point.
First: There should be a null-safe equality operator.*
- This means that a language should have some means of comparing objects which won’t throw when a null reference gets compared. For most languages this is the == operator. The Equals instance method is problematic because you have to null check your object first (and you will end up forgetting to do so eventually).
*: It would be much better if we removed null references from all modern languages for good, but let’s leave it at that for now.
Second: Boxing and two kinds of equality are a dangerous mix.
- This is very clear when we realize that
1 == 1
but(object)1 != (object)1
. This is just a problem waiting to happen, really.
*: Of course, you should really rethink your code if you’re operating on boxed objects. At least in C#, creating generic (as in polymorphic) code is almost certainly the better approach.
Third: In many cases, reference equality is simply misleading.
- In C#, for instance,
IPAddress.Parse("127.0.0.1") != IPAddress.Parse("127.0.0.1")
. Any type that conveys meaning will be thought of according to that assigned meaning. Unless you’re tracking instances, reference equality is far more confusing than it is useful.
Fourth: More generally and more importantly, having the same operator mean two different things is dangerous.
- Why? Well, what does
==
really mean?
==
means Maybe Reference Equality, Maybe Value Equality. ==
can mean two distinct things depending on what gets compared, but with one unique form/appearance.
And this can’t be good. +
is addition for numbers and concatenating for strings, but the signatures are so different that it is hard to get it wrong. What would you think of +
if it were addition for integral number types such as Int
and Long
but meant addition with rounding for fractionals such as Float
and Double
(we would make a proper Add method available, obviously)? I’m sure you would think it is an error waiting to happen.
All previous cases would be more or less solved if we created a clear separation of equality operators. Kotlin, for example, has ==
for value equality and ===
for reference equality. I would like to convince you, however, that we only need one kind of equality for general use, and that reference equality should only be used to implement this equality operator that I propose. For that, let us explore a more interesting case:
An interesting case: the Socket class
When working with multiple Sockets, it is often necessary to compare them to one another. Now, two sockets connected to the same address in the same port aren’t equal because they are independent communication channels, so wouldn’t value equality spoil that?
No, it wouldn’t. In fact these two sockets use different local ports. Even if the sockets aren’t connected nor have been modified yet, the operating system labels all these sockets to manage them internally. Of course, the operating system could use the memory address of the created socket internally, which would then be the socket’s label.
But even in that case, we only need one equality operator. It just happens to be the case that it may be implemented by comparing memory addresses in this case. Sockets just happen to be a case where it is desirable to make equality be Reference Equality. It is a case where semantics matches identity. So in our implementation of ==
for Socket we could then use object.ReferenceEquals
. So do note that Reference equality would only be used to properly implement our beloved equality operator.
Ok.. These examples are good, but aren’t there exceptions?
I believe not. Although of course I can’t provide a formal proof, there is one thing I do know: there is only one kind of equality in Haskell, and Haskell has been used a lot already. I wouldn’t be surprised if other languages do the same.
How do we fix this in C# ?
Sadly, I don’t believe it is possible without changing the language substantially and breaking existing code. What I propose below is the least invasive set of changes I can think of for the language, and it is certainly not pretty. There are better ways around this problem.*
Proposed breaking change: We could make the ==
operator non overridable and make its implementation always use the Equals instance method of its left-most non null argument (after proper null checks).
One could argue that choosing the left-most argument would make this implementation asymmetric. However, any correct implementation for equality is necessarily symmetric (if a == b
then b == a
), so it wouldn’t matter if the implementation picked the left-most argument to call Equals
on it. Also, we could go ahead and change ==
to a generic ==<T>(T a, T b)
to make this a non-issue. This would break even more code and I’m not sure how all of this would sound to language designers, though.
Also, there is some concern over efficiency. Calling a virtual method like Equals would involve dynamic dispatch, bringing its associated costs with it. I believe this is unavoidable.*
*: Unless we are willing to go even further with our breaking changes. There is a proposal to implement type classes in C# https://github.com/dotnet/csharplang/issues/110. Type classes enable a high level of ad-hoc polymorphism while maintaining static dispatch. They are a great feature and one I vouch for strongly. If you’re curious, read my article on it.
With type classes, we could make ==
an operator/function of some Equatable<T>
type class. We would then make its default implementation like the one described above, while still being overridable so that we can override it for some types to avoid calling Equals
.
So.. what are your thoughts on this? Do you have counter-examples or do you disagree on some issue? Can you think of more unintended consequences by changing the language as I proposed (I’m sure there are many!)?
EDIT (2018-01-06 14:40): I’m not sure how I missed this, but the new feature of nullable reference types makes my proposed language changes mostly unnecessary. It is also a breaking change, of course, but not only does it improve the language significantly, it also improves the state of equality. After opting in for this new feature, just make sure you never use ==
again. Stick to Equals
and you should mostly be fine. I say “mostly” because not even Equals
is always overridden to mean value equality. In the case of StringBuilder
, for instance, its Equals(object)
is inherited from Object
and tests for reference equality while its IEquatable<StringBuilder>.Equals(StringBuilder)
method tests for value equality. I believe this only strengthens the claim that there should be a single general purpose equality operator, keeping reference equality only as a means to implement equality in some cases.