One data platform, or a NoSQL store alongside it?

Most corporate data stores are comprised of various types of data. Some of it is clearly relational (products, orders, users, addresses, returns), some of it is more self-contained (emails, geodata, time series readings, api payloads). As the need for an acquisition strategy arises, one of the questions invariably is whether to leverage a NoSQL store to sit beside an existing relational database. NoSQL certainly has its place: where records stand largely on their own, where there is some flux in the individual fields, and where reads far outnumber writes, a document store can be an excellent fit.

When evaluating data storage/access strategies for JSON documents in particular, there is real value in adding a constraint to that decision-making process. Namely, what sort of data store(s) is/are we using today across the enterprise? When total cost of ownership is considered - another data platform to operate, secure, back up, and hold consistent with the rest - this can drive the discussion towards data access 'uniformity'.

The vast majority of relational DBMS's support the storage and retrieval of JSON data to varying degrees, including Oracle, SQL Server, Postgres, MariaDB. Some offer higher-performance, specialized datatypes while others leverage existing datatypes with validation wrappers.

If your organization is heavily invested in an RDBMS, leveraging its built-in JSON document storage and access capabilities makes good business sense. A recent case we encountered may make it concrete. We were engaged with a client that had built a platform where businesses could find and 'claim' their building on a map and, once claimed, surface details on it. A visitor to the site could, for example, search for the greenest apartment complexes nearby. The core data of that system was firmly relational — people belong to companies, companies carry profiles and addresses and carbon data, and all of it correlates on shared keys, where referential integrity and real transactions matter. The building record itself, though, kept changing shape — address, location, square footage, zoning, and a list that only grew — stood largely on its own. Although our preference was to place this data in a NoSQL store (Couchbase in this case), since the client was already using Postgres for their core data store, our due diligence needed to include evaluating the tradeoffs of placing that data into Postgres rather than introduce a new platform to stand up and maintain.

In the case of Postgres, this actually afforded us an additional opportunity. It has two native datatypes for JSON data. A JSON data type, which performs validation but no transformation - excellent for frequent writes/infrequent reads, and a JSONB data type, which is transformed to binary and keyed - excellent for infrequent writes/frequent reads. Because this data was updated by a batch-style job during off-peak utilization hours, and heavily read during peak utilization hours, JSONB was the logical choice.

Since Postgres has a very elegant means of allowing us to capture/store/index/access this document-style metadata, and since it was already the data store of choice for our client, forcing the introduction of another data platform (Couchbase) was not in the best interest of the client. The 'greenest near me' lookup required a geospatial index and a rankable metric, so those could be exposed in our JSONB via a GIN (Generalized Inverted Index) or a B-tree Expression Index, allowing the rest to exist as free-form fields/values. Of course, I can tell you of a time that we did elect to go with a multi-region high availability Couchbase NoSQL store, but that's a topic for another discussion...