Even after 50 years, Structured Query Language, or SQL, remains the native tongue for those who speak data. It’s had impressive staying power since it was first coined the Structured Query English Language in the mid-1970s. It’s survived and thrived through the dot-com era and the proliferation of cloud technology. In essence, SQL is a technology that evolves.
The future of SQL lies in this ability to evolve as it helps data managers address emerging data paradigms and technologies.
This is particularly true in the nascent world of graph and vector databases, which are reshaping data interaction and computation through their use in generative AI (GenAI) and large language models (LLMs).
SQL’s Current Role: The Cornerstone of Data Management
Software developers who specialize in all types of programming languages and builder tools—such as Python, Tableau, and .NET—use SQL as they interact with back-end databases. Those familiar with SQL know it is a perfect fit with relational databases, the backbone of enterprise IT data management. SQL databases store data in rows and columns while creating defined relationships between the tables to give data retrievers all the context they need.
The Internet Movie Database (IMDb) is a great example. IMDb is a single database that tells you everything there is to know not only about movies but also the directors, writers, actors, and crew.
The IMDb archive is available for use by anyone who wants to use a big, real-world database to learn SQL and/or database management techniques.
In a relational database like IMDb, each table begins with a primary key, or a single piece of data, that acts as a unique identifier.
In this example, that primary key could be a specific movie name or, more likely, a movie ID.
If you search for a given movie by name, the web interface queries the table holding MOVIE data. But there’s a lot more to a movie than its core details of when it was financed, when it was released to theaters, and its length. This other data exists in other tables that are explicitly linked, or “related,” to one another through their keys.
For example, a database concerned with movies will need separate tables to hold data about the actors in the movie.
Another table holds data about producers, another for the rest of the crew, and so forth. This makes it easy to find all the data about the movie by relating the data in the MOVIE table to the data from the other related tables.
One of the biggest factors in favor of SQL is its ease of use. In a nutshell, SQL is a very English-like language that is easy to learn and use for both developers and non-developers. The SQL SELECT command is simple enough to write and read that many people can learn the basics in just a few hours. SELECT enables a query to search for and retrieve specific datasets without the need for sophisticated code. SQL is also integrated with many of the aforementioned developer tools which further increases its widespread adoption and usability.
Even with the long-standing benefits of SQL, there is a shift happening with databases due to AI and machine learning—the rise of graph and vector databases. The growing popularity of these databases, in tandem with the ubiquity of AI, is at the heart of the latest SQL evolution.
The Next Frontier: Vector and Graph Databases
SQL may be the lingua franca of relational databases, but graph and vector databases are something different. In fact, commercial graph and vector databases carry the descriptor “NoSQL.” Ironically, NoSQL does not mean that they do not support SQL commands. Instead, it means “Not Only SQL.”
Why? SQL is so firmly embedded as the language of data in corporate IT enterprises that any new database company that does not support SQL forces a hard choice between sticking with a query language that staff already know quite well or having to learn an all-new query language. Consequently, most commercial graph and vector databases include support for the most common SQL commands, encapsulated in the ISO/ANSI SQL-92 standard.
Where relational databases represent data as tables and relationships, graph databases represent data as graph structures using nodes (similar to a table) and edges (similar to a relationship), show complex relationships between vast amounts of data. Vector databases hold unstructured data—such as images, audio, and PDF text—and turn them into mathematical representations.
Vector databases also allow for the comparison between two points of data to show potential similarities, even if the datapoints appear disparate.
The ability of vector databases to turn images, audio, and text into mathematical constructs makes vector databases extremely attractive for training LLMs. Unlike traditional relational databases such as Oracle, Microsoft SQL Server, or the popular open source database PostgreSQL, vector databases are hungry for large amounts of processing power and, therefore, consume more power than traditional CPUs can provide.
It is for this reason that, as LLMs have grown in popularity, so has the investment in graphics processing units (GPUs).
Originally, GPUs were designed to speed up rendering in video games, whose high-end graphics are supported using “vector rendering.” As it turns out, vector rendering directly maps to the needs of vector databases and provides the processing power to efficiently support GenAI platforms, LLMs, and gaming.
SQL’s Role in the Graph and Vector Era
Now that we are firmly in the age of graph and vector databases, where does SQL fit in? As with the arrival of other new technology paradigms across the decades, we once again find SQL has staying power.
For example, the most popular relational database platforms are incorporating vector data types, specialized functions to process vector data, and indexing for vector data through inverted file (IVF) indexes and hierarchical navigable small world (HNSW) indexes. On the graph side, vendors are rolling out features to support graph data types. Case in point, the International Organization for Standardization (ISO) committee for SQL is already hard at work expanding the SQL standard to incorporate additional query language elements for vector/graph database systems under the rules for SQL/PGL (Property Graph Queries), as well as the introduction of the all-new Graph Query Language (ISO-GQL), a unified and standardized language for all graph database platforms.
These enhancements make SQL more versatile and capable of handling the demands of modern data and analytics applications. This is proof that the widespread use of SQL is already influencing the database management practices of the future.
And, as mentioned earlier, there are also many popular NoSQL database platforms that support SQL-like features due to the language’s usability and ubiquity. They include platforms such as MongoDB, Cassandra, Amazon DynamoDB, Microsoft Azure Cosmos DB, and Google Bigtable. The upcoming release of this latest SQL evolution will lead to more accurate and consistent AI systems that improve outcomes across industries. Use cases include more accurate genomics in healthcare to help with preventative healthcare measures, more relevant recommendations for ecommerce customers, much faster calculations for geography-centric data systems (such as energy exploration and weather modeling), and more efficient operations for logistic companies.
The Future of SQL
I remember when I first used SQL in the mid-1980s working on the DEC RDB data platform. SQL wasn’t yet standardized.
But it didn’t take long for the American National Standards Institute (ANSI), and soon thereafter the ISO, to recognize the importance of the language and standardize it in 1986 and 1987.
From there we’ve seen SQL grow and become the primary language of major database management systems (DBMS) around the world. It’s not only the cornerstone of enterprise IT data, it is used in billions of Android devices using the lightweight relational database SQLite. As SQL continues to evolve, the onus will be on organizations to prepare for the industry shifts that properly leverage SQL in an AI world.
What does this mean for you? IT leaders must upskill talent that is not only fluent in SQL, but with foundational concepts for relational databases, GenAI, and application development. In terms of hardware, they must embrace GPU-based systems that underpin AI technology and exploit the value of graph and vector databases.
Lastly, they must leverage new and traditional data platforms and relationships with industry partners to help them transition to the newest and most powerful data management systems. If they don’t, they will quickly fall behind as they watch their competitors who have made the transition pass them by.
Embracing these changes, and the next stage of SQL, will help organizations empower innovation, aid in analytics and forecasting, and empower data-driven decision making for decades to come.