Imagine Google returning search results on “Lady Gaga” in 0.03 seconds, but taking 30 seconds to return results for “Lunar Eclipse.” That might seem unacceptable; however, that is the reality for most of today’s enterprise data analytics. Some queries can come back in seconds with others taking minutes or even hours. Perhaps with a lot of tuning ahead of time, ad hoc analysis query performance can be improved, but in most cases, it remains a huge challenge. When you have the right indexes, summary tables, and statistics, you can get answers quickly. The IBM Informix Warehouse Accelerator solves this problem using novel algorithms on modern hardware. This query accelerator dramatically improves performance while eliminating traditional tuning tasks. Dramatic improvement in query performance will help provide “train of thought” analysis for business analysts. Faster answers will enable more questions on the data, better data-driven decisions and faster insights on business.
IBM customers are already using this technology in mission-critical production environments. They report having achieved orders of magnitude improvement in their data warehouse performance.
For example, before using Informix Warehouse Accelerator, complex inventory and sales analysis queries on the enterprise warehouse with more than a billion rows took anywhere from a few minutes to 45 minutes to run, says Ashutosh Khunte, manager of Data Management Services at Skechers, USA. But, when the company ran those same queries using Informix Warehouse Accelerator, they finished in 2 to 4 seconds, meaning that they ran from 60 to 1400 times as quickly, with an average acceleration factor of more than 450 — all without any index or cube building, query tuning or application changes.
Performance
As Skechers’ testing illustrated, once data is loaded, there is nothing else to do. The Informix Warehouse Accelerator is ready for peak performance. This query acceleration approach eliminates many tasks without requiring changes to the data warehouse, reporting, or application environment. It’s the epitome of “plug and play.”
No tuning for each query or workload. During the installation of Informix Warehouse Accelerator, basic memory and storage configurations are provided. Thanks to consistent plans, elimination of disk I/O, and fast scans and joins, the need for runtime tuning is eliminated.
No index to create, no index advisor needed, no index reorganization required. Informix Warehouse Accelerator logically scans billions of rows in milliseconds or seconds. Indexes are not needed because of deep columnar data representation, query processing on compressed data, and innovative algorithms that exploit modern processors.
No statistics to collect. Traditional optimizers rely on regular statistics collection to create better query plans. Informix Warehouse Accelerator automatically determines join, and uses star join plans consistently.
No partitioning (fragmentation) schemes to create. The data is automatically compressed and partitioned both vertically and horizontally. The queries also benefit from vertical and horizontal partition pruning (aka fragment elimination) due to cell-based, deep columnar storage. This architecture eliminates planning for table partitioning schemes.
No storage management. All the data is stored in-memory with just a copy of the in-memory image on the disk. There is no need to plan and create storage spaces for tables and indexes.
No expensive hardware. Informix Warehouse Accelerator is designed to run on commodity hardware and to evolve with your infrastructure. The accelerator runs on Linux/Intel computers. It works with Informix database server on Linux/Intel, AIX/Power, HP-UX/Itanium, Solaris/Sparc. Just like Informix, Informix Warehouse Accelerator can also run in virtualized and cloud environments. This design gives you enormous flexibility to deploy your warehouse.
No database changes. Informix Warehouse Accelerator exploits the existing logical schema in a data warehouse.
No application changes. Informix Warehouse Accelerator plugs into the Informix database server as a resource. The Informix database server knows the data marts that are stored in the accelerator server and automatically routes relevant queries. No changes are required to the application or tools.
No summary tables or materialized views to create. With Informix Warehouse Accelerator, table scans and joins are an order of magnitude faster than with traditional databases. This performance eliminates the need to create and maintain summary tables and to use related advisors.
Start to query in 5 steps.
You have many deployment options for Informix and IWA. The accelerator server connects to one or more Informix database servers through TCP/IP. The accelerator server always runs on a Linux operating system on a high-performance Intel Xeon processor (for example, Nehalem, Westmere). You can run the Informix database server on the same Linux/Xeon system. Also, you can run the Informix database server on a Linux/Intel, AIX/Power, HPUX/HP Itanium, or Solaris/Sparc system, and run the accelerator server on a separate Linux/Intel system. The accelerator can scale up on a single SMP system or scale out on a cluster system. Both Informix database server and the accelerator can run in virtual and cloud environments (e.g., Amazon Cloud).
Figure 2 shows the steps that you can take to start accelerating existing queries. Step 1 is the familiar Informix installation. In Step 2 you install Informix Warehouse Accelerator and an administration tool, IBM Smart Analytics Optimizer Studio. In Step 3, you configure the accelerator server, which requires only a few configuration parameters. In Step 4, you design your data mart with a star schema or snowflake schema, and deploy the data mart by sending the data mart information to the accelerator server. In the final step, simply send the data to the accelerator server for query acceleration.
Made for Speed
The extreme performance that can be achieved with the Informix Warehouse Accelerator is based on technologies developed by IBM research and development. Papers listed at the end of this article describe the underlying theory and techniques. IBM has filed multiple patent applications on these techniques.
Extreme compression and efficient query processing on compressed data eliminate disk I/O during query processing and enable a large in-memory warehouse. Deep columnar technology goes beyond traditional columnar storage. Exploitation of multi-core architectures and SIMD technology gets you incredible speed without indexes or summary tables. Let’s look into some of these techniques now.
When data is loaded into the data mart, each table is analyzed for frequently occurring values in columns and related column groups. The most optimal columns are combined to form a tuplet, which is a fraction of a row. In the example in Figure 3, the Product column and the Origin column are correlated and hence combined to form a tuplet. The most frequently occurring values are encoded with the least number of bits. This technique, known as Huffman encoding, increases the compression efficiency and can be used to evaluate both equality and range predicates. The top 64 product values in the Product column are combined with the most frequently occurring values (U.S., China) in the Origin column to form the smaller cell1. Because the query processing is done on compressed data, fewer bits translate to higher speed.
The Informix Warehouse Accelerator stores data in column groups, or vertical partitions of the table, called “banks.” A tuplet fits in each bank. The assignment of columns to banks is cell-specific because the column lengths vary from cell to cell. The assignment uses a bin-packing algorithm that is based on whether the column fits in a bank, whose width is some fraction of a word, rather than usage in a workload. Also, scans need to access only the banks that contain columns referenced in any given query. This technique minimizes the amount of memory to scan and saves considerable CPU cycles.
SIMD instructions on Intel Xeon processors operate on 128-bit registers. The compression technique in the accelerator typically requires few bits for each column and hence can load many fields in each 128-bit register. The Informix Warehouse Accelerator can load multiple values and apply predicates on all columns simultaneously. During query processing, this overloaded operation is happening on all the allocated cores resulting in extreme parallelism for the query. A single CPU instruction can operate simultaneously on all the 12 values, resulting in significant performance gain.
Conclusion
Informix Warehouse Accelerator offers an innovative approach to complex query processing with large data. It can improve the productivity of your business by providing answers faster without increasing your work or breaking your budget. Because the product is tightly integrated with the Informix database server, you can exploit the environment to divide the load between the database server and the accelerator server, as necessary.
Faster response time means quicker answers, quicker insights, and a business that can adjust faster. You can plan to accelerate the high value part of your warehouse and dynamically evolve the infrastructure to suit business needs.
About the author:
Keshava Murthy is the Senior Technical Staff Member at IBM Informix development working on query processing and data warehousing. He has worked on multiple relational and object relational databases. He has developed features in SQL, RTREE, distributed queries, heterogeneous transaction management, and extensibility components of Informix. He has received two Outstanding Technical Achievement Awards from IBM. Keshav holds Bachelor degree in computer science and engineering from the University of Mysore, India.