The origin of the relational database was a new concept for the way data could be represented and accessed. Edgar Codd – the pioneer of the relational model – formulated 12 rules to define an RDBMS, all of which defined the nature of data representation, retrieval and manipulation. However, for the relational database to succeed in the enterprise, the RDBMS needed to assume the requirements and responsibilities of production database systems: availability, performance and, of course, security.
Over the decades since the emergence of the relational model, security and authentication capabilities have continually improved. Databases such as Oracle and SQL Server support strong and flexible security features. Access to the database is restricted to users who have been authenticated either by the database itself, or who use external authentication such as Kerberos. Once authenticated, access to tables, rows and columns can be restricted by user, role and groups. Data can be encrypted or obscured, and label-based security allows for top secret-style security mechanisms. Data accesses can be audited, and, in some cases, dynamically refused based on suspicious access requests that might signal SQL injection attempts.
Security for NoSQL Evolves to Attract Wider Enterprise Adoption
The first new-generation non-relational “NoSQL” databases, like the early relational databases, had very simplistic security mechanisms. But these continue to evolve rapidly as NoSQL strives for wider enterprise adoption.
Initial versions of Hadoop – and existing default configurations – have little or no authentication. By default, the Hadoop client provides the identity of the user to the Hadoop cluster, which accepts that identity without verification. However, modern versions of Hadoop can run in “secure mode,” which allows Hadoop to confirm identify using the Kerberos protocol.
The Hadoop HDFS filesystem contais POSIX-type permissions that will be familiar to any Linux or UNIX user. Each file associates read/write/execute permissions with the owner, a group and the world. So, it’s possible to limit modifications to a specific user while allowing read capabilities to a wider group or users and denying any access otherwise.
The problem with this scheme is that in a big data application, a single HDFS file may contain information that spans the scope of a number of users. For instance, a HDFS file in a medical application may contain medical records for all patients, and there is no way to limit access to a specific doctor or medical center. If you can read any data in the file, you have permission to read all the data in the file.
Addressing NoSQL Security Limitations with New Approaches
Apache Accumulo is a key-value store developed at the NSA which aims to address these limitations. Like HBase, it runs on top of HDFS and is based on the Google BigTable model. Accumulo allows each value to be associated with a security tag that can be used to enforce either role-based or label-based security. Accumulo also supports encryption of data at rest (in the file) or in motion (over the wire), and for access auditing.
Non-relational databases oriented towards web applications and OLTP such as MongoDB and Cassandra have less need for fine grained security, since all access is normally mediated by the application layer, which authenticates to the database with a single identity. However, this leaves the database vulnerable to a user within the firewall, such as a DBA or a user with ad hoc query capability. Therefore, all NoSQL databases aspiring to the enterprise are working hard to improve their security. Both MongoDB and Cassandra now support Kerberos authentication in their enterprise (non-free) editions, and Cassandra also supports transparent data encryption.
Robust security is a must-have for any database hoping for widespread enterprise adoption. Consequently you can expect to see a lot of progress in NoSQL security over the next few years.