Notes to Self: Mongo Aggregate Pipeline Sorting Redux

A better solution to the lack of NULL LAST sorting in Mongo.

gmb published on
3 min, 533 words

Another pipe, because I can’t think of a better image… Photo by Martin Adams on Unsplash

In my previous post on this subject I described a way to use multiple Mongo queries, followed by Python’s itertools.chain() to do NULL LAST sorting in Mongo (which isn’t a thing that exists as a feature of the DB).

However, I knew that this wasn’t an optimum solution. Partly because it’s doing multiple queries, but mostly because it means that doing things like pagination are really complicated thereafter.

… but there is a better way.

Listen to people who’ve been there before

As it turns out, I’m currently working for a client whose principal Product Owner was a SQL DB admin in a past life. Discussing this with him, I complained about the lack of NULL LAST sorting in Mongo, and he, having worked in a world where NULL LAST wasn’t a thing, came up with this rather elegant solution:

Instead of fetching all the nulls, and then all the not nulls, then sorting the not-nulls, then returning the data with nulls last, you can do the equivalent of COALESCE-ing the nulls to some sane-but-guaranteed-to-be-last value (which obviously depends on the datatype of the field being sorted-upon) and then sort by the sort field.

The solution in practice

Previously, we defined the following data model (in MongoEngine syntax):


class BlogTag(Document):

    title = StringField(required=True)
    slug = StringField(required=True)
    hidden = BooleanField()


class BlogPost(Document):

    title = StringField(required=True)
    body = StringField(required=True)
    tags = ListField(ReferenceField(BlogTag))
    draft = BooleanField()

We wanted to sort by the number of tags on a BlogPost, with posts with a null tag count (because they’re drafts and therefore don’t count) always last.

We can’t use $ifNull here (normally, we would), because it doesn’t really apply. The tag count being null is a function of the draft flag. Instead we use $switch in the $addFields pipeline. We can drop the $match block since we no longer want to return only the non-draft posts.

pipeline = [
    {
        "$addFields": {
            "tag_count": {
                "$switch": {
                    "branches": [
                        {
                            "case": {"$ne": ["draft", false]},
                            "then": null_value,
                        }
                    ],
                    "default": {
                        "$size": {
                            "$filter": {
                                "input": "$tags",
                                "as": "tag",
                                "cond": {"$not": ["$$tag.hidden"]},
                            }
                        }
                    },
                },
            }
        },
    },
    {"$sort": {"tag_count": 1 if order == "asc" else -1}},
];

null_value here is any value which can reasonably replace a null so that the value will be last in a sorted set. This takes a different value depending on whether we’re sorting ASC or DESC. If we’re sorting by ascending tag count, this would be a large number - 99999999 or such. On the flip side, when sorting by descending tag count we could set null_value to -1.

Summary

There’s nearly always a better way to skin the cat that you’re trying to skin. Whilst this method adds some complexity to the pipeline (not much, but some), it reduces complication from the calling code. And as the Zen of Python tells us: Complex is better than complicated.