Notes to Self: Mongo Aggregate Pipeline Sorting Redux
A better solution to the lack of NULL LAST sorting in Mongo.
Photo by Martin Adams on Unsplash
In my previous post on this
subject
I described a way to use multiple Mongo queries, followed by Python’s
itertools.chain() to do NULL LAST sorting in Mongo (which isn’t a thing
that exists as a feature of the DB).
However, I knew that this wasn’t an optimum solution. Partly because it’s doing multiple queries, but mostly because it means that doing things like pagination are really complicated thereafter.
… but there is a better way.
Listen to people who’ve been there before
As it turns out, I’m currently working for a client whose principal Product
Owner was a SQL DB admin in a past life. Discussing this with him, I
complained about the lack of NULL LAST sorting in Mongo, and he, having
worked in a world where NULL LAST wasn’t a thing, came up with this rather
elegant solution:
Instead of fetching all the nulls, and then all the not nulls, then sorting
the not-nulls, then returning the data with nulls last, you can do the
equivalent of COALESCE-ing the nulls to some sane-but-guaranteed-to-be-last
value (which obviously depends on the datatype of the field being sorted-upon)
and then sort by the sort field.
The solution in practice
Previously, we defined the following data model (in MongoEngine syntax):
class BlogTag(Document):
title = StringField(required=True)
slug = StringField(required=True)
hidden = BooleanField()
class BlogPost(Document):
title = StringField(required=True)
body = StringField(required=True)
tags = ListField(ReferenceField(BlogTag))
draft = BooleanField()
We wanted to sort by the number of tags on a BlogPost, with posts with
a null tag count (because they’re drafts and therefore don’t count) always
last.
We can’t use $ifNull here (normally, we would), because it doesn’t really
apply. The tag count being null is a function of the draft flag. Instead
we use $switch in the $addFields pipeline. We can drop the $match block
since we no longer want to return only the non-draft posts.
pipeline = [
{
"$addFields": {
"tag_count": {
"$switch": {
"branches": [
{
"case": {"$ne": ["draft", false]},
"then": null_value,
}
],
"default": {
"$size": {
"$filter": {
"input": "$tags",
"as": "tag",
"cond": {"$not": ["$$tag.hidden"]},
}
}
},
},
}
},
},
{"$sort": {"tag_count": 1 if order == "asc" else -1}},
];
null_value here is any value which can reasonably replace a null so that
the value will be last in a sorted set. This takes a different value depending
on whether we’re sorting ASC or DESC. If we’re sorting by ascending tag
count, this would be a large number - 99999999 or such. On the flip side,
when sorting by descending tag count we could set null_value to -1.
Summary
There’s nearly always a better way to skin the cat that you’re trying to skin.
Whilst this method adds some complexity to the pipeline (not much, but some),
it reduces complication from the calling code. And as the Zen of Python tells
us: Complex is better than complicated.