Notes to Self: Mongo Aggregate Pipeline Sorting Redux
A better solution to the lack of NULL LAST sorting in Mongo.
Photo by Martin Adams on Unsplash
In my previous post on this
subject
I described a way to use multiple Mongo queries, followed by Python’s
itertools.chain()
to do NULL LAST
sorting in Mongo (which isn’t a thing
that exists as a feature of the DB).
However, I knew that this wasn’t an optimum solution. Partly because it’s doing multiple queries, but mostly because it means that doing things like pagination are really complicated thereafter.
… but there is a better way.
Listen to people who’ve been there before
As it turns out, I’m currently working for a client whose principal Product
Owner was a SQL DB admin in a past life. Discussing this with him, I
complained about the lack of NULL LAST
sorting in Mongo, and he, having
worked in a world where NULL LAST
wasn’t a thing, came up with this rather
elegant solution:
Instead of fetching all the nulls, and then all the not nulls, then sorting
the not-nulls, then returning the data with nulls last, you can do the
equivalent of COALESCE
-ing the nulls to some sane-but-guaranteed-to-be-last
value (which obviously depends on the datatype of the field being sorted-upon)
and then sort by the sort field.
The solution in practice
Previously, we defined the following data model (in MongoEngine syntax):
class BlogTag(Document):
title = StringField(required=True)
slug = StringField(required=True)
hidden = BooleanField()
class BlogPost(Document):
title = StringField(required=True)
body = StringField(required=True)
tags = ListField(ReferenceField(BlogTag))
draft = BooleanField()
We wanted to sort by the number of tags on a BlogPost
, with posts with
a null
tag count (because they’re drafts and therefore don’t count) always
last.
We can’t use $ifNull
here (normally, we would), because it doesn’t really
apply. The tag count being null
is a function of the draft
flag. Instead
we use $switch
in the $addFields
pipeline. We can drop the $match
block
since we no longer want to return only the non-draft posts.
pipeline = [
{
"$addFields": {
"tag_count": {
"$switch": {
"branches": [
{
"case": {"$ne": ["draft", false]},
"then": null_value,
}
],
"default": {
"$size": {
"$filter": {
"input": "$tags",
"as": "tag",
"cond": {"$not": ["$$tag.hidden"]},
}
}
},
},
}
},
},
{"$sort": {"tag_count": 1 if order == "asc" else -1}},
];
null_value
here is any value which can reasonably replace a null
so that
the value will be last in a sorted set. This takes a different value depending
on whether we’re sorting ASC
or DESC
. If we’re sorting by ascending tag
count, this would be a large number - 99999999
or such. On the flip side,
when sorting by descending tag count we could set null_value
to -1
.
Summary
There’s nearly always a better way to skin the cat that you’re trying to skin.
Whilst this method adds some complexity to the pipeline (not much, but some),
it reduces complication from the calling code. And as the Zen of Python tells
us: Complex is better than complicated
.