For many years, the MongoDB World conference has been the heartbeat synchronizing the engineering and marketing activities of the MongoDB company. Usually held in New York City, MongoDB World has been the primary venue for announcing new products and releases.
This year, a physical conference in New York City was inconceivable. With NYC and the world still in various levels of COVID-19 lockdown, MongoDB World was held in the cloud as a virtual event—MongoDB.Live—in the first week of June.
Holding the annual conference in the cloud is quite apt in many ways. MongoDB’s future as a business is cloud-centric, with the MongoDB Atlas cloud database representing the leading edge of MongoDB revenue growth and arguably the leading independent cloud database service.
MongoDB 4.0 is almost 2 years old now, and so there was some speculation that the company might be about to release a major new version of its flagship database. However, what was announced was MongoDB 4.4, an incremental release with evolutionary rather than revolutionary improvements.
The most critical features in MongoDB 4.4 are improvements to the aggregation (analytic) functionality and clustering capabilities.
The aggregation framework is MongoDB’s answer to analytic SQL queries. It allows for the grouping, joining and analyzing of data which would require complex SQL statements in relational database systems. Indeed, the MongoDB BI connector translates SQL statements into aggregation framework requests so that MongoDB can integrate with BI tools such as Excel or Tableau.
In MongoDB 4.4, the aggregation framework has become extensible. Users can create their own functions and use these within aggregation pipelines. This functionality is substantially equivalent to the User Defined Functions (UDF) which allow users of relational databases such as Oracle to extend SQL statements.
A new union aggregation pipeline function now allows aggregation pipelines to combine data from multiple collections. This mirrors the SQL UNION operation. While in the past you could join collections in an aggregation, you could not append new rows from another collection.
Queries that are issued against replica sets can now experience improved and more predictable performance through “hedged” reads. These hedged reads are directed to multiple nodes within the replica set and will complete as soon as the fastest node replies.
“Mirrored” reads are directed to all members of a replica to “warm-up” their cache. If one of these secondaries becomes a primary node following a failover, data is already in memory. This avoids a “cold cache” issue that currently occurs after failover in which the new primary has to retrieve frequently accessed data from disk.
For sharded clusters in which data is partitioned across multiple replica sets, MongoDB 4.4 adds two significant enhancements. First, you can now redefine the shard keys used to partition the cluster without downtime. Before 4.4, changing these shard keys was operationally close to impossible.
In addition, hashed shard keys may now be based on multiple attributes. This allows for a more even distribution of data across shards.
MongoDB 4.4 continues the MongoDB pattern of introducing incremental but important enhancements between major “dot-zero” releases. None of these changes are likely to result in a shift in MongoDB’s market position, but they should be warmly welcomed by the MongoDB developer community.