When I was a young man—a long, long time ago now—I worked as an Oracle DBA (Oracle version 6, if you must know). I remember my astonishment at finding out that information in the database was stored in plain text within the database files. That meant if I could gain read access just to those files, I could read all the information in the database. It didn’t matter what security controls I, as the DBA implemented at the database level—an attacker who could gain read access to the files on disk could read everything.
I've always been surprised that such obvious vulnerabilities did not lead to more reports of database breaches (though we’ll never know how many unreported hacks of database systems occurred). But it was reassuring to see these sorts of vulnerabilities eventually fixed.
Almost all major databases—MongoDB included—now support “encryption at rest” and “encryption in transit.” Encryption at rest solves the issue that I first encountered so many years ago. Data in the files on disk is encrypted and can only be read by a program that is in possession of appropriate decryption keys. These keys are typically held only by the database server and generated using separate master keys. In practice, this prevents an attacker from reading the contents of files on disk.
Encryption in transit is provided by the same SSL/TLS protocols that protect secure communications across the web using HTTPS. In this scenario, when a client connects to a database server, that server provides a signed SSL certificate that proves the server’s identity. The client then sends an encrypted key to the server which is used to encrypt data passing between the client and the server. In this way, a “man in the middle” cannot eavesdrop on the conversation and cannot impersonate the database server.
These two encryption techniques prevent a wide range of security breaches and are implemented in one form or another by most database servers. MongoDB has supported both of these since version 3.2.
However, these encryption schemes do not prevent all possible breaches of information. The database server remains capable of decrypting all data and therefore it is possible for a privileged user of the database—a DBA for instance—to read everything. Furthermore, everything in the database is encrypted using the same encryption keys, which means if you can access one encrypted data item you can potentially read any other encrypted data item. Unencrypted copies of data can often be found in replication streams and logs or in server memory structures. For very sensitive data, these vulnerabilities are unacceptable.
Client Side Field-Level Encryption (CSFLE) is MongoDB’s answer to these problems. With CSFLE, the MongoDB driver selectively encrypts data before it is sent to the database server, using keys that are not known to the server. Since the database server never sees unencrypted versions of the data, there is no chance that a database hacker can get access to the decrypted data—at least not from the database end.
CSFLE also allows distinct encryption keys to be used for different application users.This gives multiple users within the application isolation from each other's data and allows the database to “forget” the data by destroying the keys involved.
Encryption does come with a cost. There is some CPU overhead involved in decrypting and encrypting the data. Some database operations can slow down because of the opaqueness of the encrypted data. For instance, index range scans cannot be performed because the encrypted data does not preserve the ordering of the unencrypted data. There’s also the not insignificant problem of complete data loss should the encryption keys be lost.
Nevertheless, CSFLE is a big step forward in MongoDB security and should further help dispel perceptions of MongoDB as a relatively insecure database option.