Scaling a multi-tenant database: sharding, or active-active multi-master?

When a multi-tenant system outgrows what a single database node can comfortably carry, there are two open-source, high-availability approaches to consider, and they actually address different high-level requirements. One focuses on delivering horizontal sharding, which partitions tenants across nodes (PostgreSQL with Citus). The other delivers active-active multi-master, in which every node holds the whole dataset and a write can land on any of them (MariaDB with Galera Cluster). Both are production-proven, so the choice depends on the actual workload rather than simply preference.

Some years back, on a Postgres engagement whose data was partitioned by marketing-campaign client, we needed to future-proof the system, and sharding by tenant proved hugely beneficial. The two candidates at that juncture were Citus and Postgres-XL, and we elected to go with Postgres-XL, judging it the more technically sound of the two at the time. In the years since, the extension approach won out: Citus, built as an extension to an unmodified Postgres, kept pace with the core database's release cadence, while Postgres-XL — a hard fork with its own global transaction manager — gradually fell behind and stalled. One lesson learned in this instance, by the way, was the fork-versus-extension distinction itself. The implementation approach that ultimately favored Citus, staying current with core Postgres with nothing forked to maintain, turned out to be a better long-term decision point.

We took a second, more fundamental lesson from that engagement. Upon shifting to a distributed Postgres, a great many of the existing foreign keys broke. Referential constraints do not hold across shards as they do on a single node. The broken constraints were not the core issue, however. What we took from that engagement is that software engineering discipline is what keeps a data model correct. Foreign keys scattered across every relation for the sake of developer safety, rather than to enforce a real invariant, are a poor substitute. In fact they can actually prove to be a significant impediment to sharding at some future point. Also as we learned in a separate engagement, they impose a performance cost of their own, which compounds as foreign keys proliferate.

That accumulated experience shaped the database decision in an open-source 'pay it forward' project we are building presently. Its workload is read-heavy delivery with modest editorial writes and JSONB-centric content, served under schema-per-tenant isolation. That isolation is itself a requirement learned across a number of SaaS platform engagements, where customers routinely ask that their data not be co-mingled with anyone else's. With regulated customers the bar is higher: on one engagement we had to produce documentation that clearly demonstrated a particular customer's data sat completely isolated, and meeting that bar drove changes deep in an inherited architecture that predated the requirement. Designing that isolation in from the start — a true in-database namespace in Postgres, routed by search_path, rather than a separate database per tenant — is the direct lesson of having once had to retrofit it.

For that profile, active-active multi-master solves a problem we do not have: it adds write availability without adding write capacity — every node still holds everything — so it does nothing for the case of too many tenants for a single node, which is exactly what sharding addresses. Galera's one decisive advantage, the ability to write to any node, is irrelevant to a read-heavy CMS, where a single primary with replicas and failover is the appropriate high-availability design. So the decision is single-node Postgres first, with Citus schema-based sharding as the named scale-out path, adopted only once a single node is genuinely exhausted; and because Citus is an additive extension rather than a fork, adopting it later involves no lock-in penalty.

There is more to tell about the day a carefully tuned, cross-region Galera cluster came undone under a flood of per-field upserts from a web form, but that is a topic for another conversation.