Linq ain’t all that distinct…

I came across an interesting programming challenge yesterday that I thought I should share.  I’m working on a relatively simple ASP.Net MVC 3 web application which has an integrated “search” feature.  Since I’m using Entity Framework 4 as my ORM, I’ve been writing a lot of Linq queries.  The query I am using that caused me some head-scratching was:

 var query = from f in _ctx.FollowUps
             from p in _ctx.Patients
             orderby p.PatientName 
             where (f.CarePlan.Contains(currTerm) ||
                    f.ImagingResult.Contains(currTerm))
             select new FollowupResult()
             {
                   FollowupID = f.ID,
                   FollowupDate = f.FollowupDate,
                   CarePlan = f.CarePlan,
                   PatientName = p.PatientName,
                   DOP = p.DOP,
                   ImagingResult = f.ImagingResult,
                   PatientID = p.ID
             };

Seems pretty straightforward, right?  The query, which searches for user-specified search terms in two “free text” fields in the database and returns any matches, works just fine.  Too well, in fact.  Because the same search term could well appear in either the CarePlan or the ImagingResults fields, I “or-ed” the query, which caused duplicates to appear in my result set for any records that had the search term in both fields.  To wit:

Since my test patients all had the search term “good” in both fields, and I had a total of 31 patients, I knew I had a duplication problem.  Clearly, Linq was rewriting the SQL statements in such a way that it returned each hit twice.

Well, I could fight with Linq, but what I really needed was the equivalent of a SELECT DISTINCT() SQL statement, which (of course) Linq has.  So I just assigned the output of my query to a strongly typed list, and added the Linq Distinct() operator:

IEnumerable<FollowupResult> fups = followups.Distinct().ToList();

To my dismay, this didn’t change anything! Everything was still duplicated.  Wow, I thought, I guess Distinct isn’t so distinct!  After a little googling, though, the obvious answer appeared:   Remember in Linq that you’re dealing with classes that have methods and properties, all of which are reference-based (otherwise, they’d be structs, right?), so the distinct operator is comparing the reference of one object to the reference of another, and this means that, from an OOP perspective, they are in fact different.  So Distinct() fails.

Sigh.  What to do?

Well, it turns out that the Distinct() method accepts an overload.  Overloads, I have learned, are the API designer’s way of saying, “Yeah, I know it doesn’t really do what you want, but you can change the behavior to whatever you wish.  Just use this overload and write it yourself!”

So, now I needed to write my own comparer, based on the IEqualityComparer<TSource> class.  This comparer needs to compare a property (in this case, the PatientID property) of the objects being compared, not the object references themselves.  As I thought about this (and made an abortive run at it), it occurred to me that other duplications could conceivably occur for other search types (I won’t bother you with the details, but it seemed reasonable).  What I really needed to write was a comparer derived from IEqualityComparer() that could compare any object property I specified!  This would be a much more “generic” and useful implementation.  So I rolled up my sleeves, took a slug from my ever-present water bottle, and fired up the duct-tape programmer’s favorite development tool…Google.

As usual, a little googling saved me some time; believe it or not, I wasn’t the first developer to run into this little snag.  There is a well-written and clearly explained “Generic” IEqualityComparer project here on CodeProject.  So, with the magic of cut and paste (and one or two tweaks), I created my comparer, instantiated it and ran the query again:

And voila!  Good results.

Here’s the PropertyComparer class:

public class PropertyComparer<T> : IEqualityComparer<T>
{
   private PropertyInfo _PropertyInfo;
	
/// <summary>
/// Creates a new instance of PropertyComparer.
/// </summary>
/// <param name="propertyName">The name of the property on type T 
/// to perform the comparison on.</param>
public PropertyComparer(string propertyName)
{
 //store a reference to the property info object for use 
 //during the comparison
  _PropertyInfo = typeof(T).GetProperty(propertyName, 
   BindingFlags.GetProperty | BindingFlags.Instance | BindingFlags.Public);
	if (_PropertyInfo == null)
	{
		throw new ArgumentException(string.Format("{0} is not a
                property of type {1}.", propertyName, typeof(T)));
	}
}
	  
public bool Equals(T x, T y)
{
	//get the current value of the comparison property of x and of y
	object xValue = _PropertyInfo.GetValue(x, null);
	object yValue = _PropertyInfo.GetValue(y, null);
		
	//if the xValue is null then we consider them equal if and only 
        //if yValue is null
	if (xValue == null)
		return yValue == null;
			
	//use the default comparer for whatever type the 
        //comparison property is.
	return xValue.Equals(yValue);
}
	
public int GetHashCode(T obj)
{
	//get the value of the comparison property out of obj
	object propertyValue = _PropertyInfo.GetValue(obj, null);
		
	if (propertyValue == null)
		return 0;
			
	else
		return propertyValue.GetHashCode();
	}
}  

Kudos and credit go to Seth Dingwell, who’s code I absconded and put to use as follows:

PropertyComparer<FollowupResult> PatientIDComparer = 
                       new PropertyComparer<FollowupResult>("PatientID");
 IEnumerable<FollowupResult> fups = 
                          followups.Distinct(PatientIDComparer).ToList();

This works extremely well, is very easy to implement and will no doubt come in useful down the road.  Happy hacking, folks!

6 thoughts on “Linq ain’t all that distinct…

  1. Thinking about the comparer … What do you say for this implementation:

    public class PropertyComparer : IEqualityComparer
    {
    private Func _compareFunction;

    public PropertyComparer(Func compareFunction)
    {
    _compareFunction = compareFunction;
    }

    public bool Equals(T x, T y)
    {
    return _compareFunction(x, y);
    }

    public int GetHashCode(T obj)
    {
    return _compareFunction.GetHashCode();
    }
    }

    Usage:

    class Data
    {
    public int Number { set; get; }
    public string Name { set; get; }
    }

    PropertyComparer comparer = new PropertyComparer((Data x, Data y) => x.Number == y.Number);

  2. Wow the idea is that PropertyComparer class to be generic and in its constructor take function for comparing the ‘T’ (generic type) in my case the class is Data (sorry for mistyping ‘code’ tags)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s