The data explosion driving data warehouse equipment purchases in the last few years has just begun. Equipment proliferation already pressurizes data center energy requirements. Fortunately, a column-based analytics server can help companies with both kinds of green - the environment and money - by offering enormous energy and cost reductions while significantly boosting performance.
The Data Explosion Causes Energy Escalation
No one involved with analytics doubts that data is exploding. Web sites and operational systems already produce volumes. Now there are even more reasons to keep data around. New business models based on analytics are emerging. New business regulations plus several civil court actions resulting in huge damage settlements have focused efforts on unstructured data, starting with e-mail. Massive volumes of messages need to be archived and analyzed for everything from signs of potential legal issues to information for upcoming sales or service efforts. Using present methods, this analysis will require huge amounts of extra IT power.
Simultaneously, IT is facing increasing pressures to control the growth of its energy and carbon footprints. Data center electrical consumption has doubled in the last five years according to a study conducted for AMD by a Stanford University professor. Some studies estimate that information technology accounts for two percent of U.S. power usage. This power escalation causes five problems.
* Cost: Electricity can account for 15-20 percent of the data center cost. Kenneth Brill of the Uptime Institute reports that the average wattage required to cool hardware has grown from eight watts per hour for $1,000 of hardware in 2000, to 109 watts today. Price per kilowatt hour varies by region and by season, but in any case this is more than a 12-fold increase.
* Cooling: Equipment densities cause hot spots that can surpass 30 kilowatts per rack.
* Limited energy envelope: Every data center has a power ceiling that cannot be exceeded without expensive building improvements. Kilowatts have become the critical data center currency.
* Pollution: The manufacturing, use and disposal of most electronic equipment today worldwide depends on fossil fuels and the use of hazardous substances, releasing large amounts of pollutants, including carbon dioxide, a major contributor to global warming.
* System availability: Sunguard reports that 26 percent of the IT business continuity disasters it responded to in 2006 were power related.
Harnessing new technology to tackle the problem
Fortunately, new technologies promise to help the energy crisis. Efficient hardware and a well-architected IT infrastructure play a strong role in reducing energy costs. Virtualization is widely discussed - and increasingly being applied - to increase hardware utilization, getting more from the boxes already on the data center floor. The good news is that virtualization could increase disk utilization from about 15 to 20 percent up to 80 percent.
When it comes to data warehouses, however, an optimal "green" strategy shouldn’t stop with efficient hardware and virtualization. In fact, power savings should start with data management software considerations. Virtualization tries to sop up a problem that doesn’t have to exist. Using general-purpose, row-based DBMS that were never designed for analytics causes significant hardware bloat. The inefficiencies of these row-based systems add to IT’s power management problems in two ways. First, general purpose databases double and in some cases triple raw data with indices and management overhead requiring extra storage. Second, since they aren’t built for analytics, they are slow. This often leads to “throwing more hardware” at the problem.
Column-based analytics servers save energy and boost performance
As an alternative to the inefficient row-based DBMS, a column-based analytics server provides cost and energy advantages similar to virtualization on the server side while dramatically accelerating performance. The Sybase IQ analytics server, the first and most notable system of this type, yields up to 100-times performance improvements with up to 85 percent compression of the raw data in large analytic applications, resulting in far lower hardware requirements and a decrease in CO2 emissions by as much as 90 percent. These results were recently demonstrated in an independently audited study on an unprecedented one petabyte of raw data - the world’s largest data warehouse.
Once forward-looking companies realize the advantages of a purpose-built analytics server, they start to convert many kinds of intelligence applications to this type of system. The most typical use is in applications such as analytics-as-a-service and advanced analytics (such as risk analysis, click stream analysis, or fraud detection) requiring instant answers on large amounts of data. Often these applications also must serve many complex, ad hoc queries from hundreds of concurrent users. On-demand analytics where applications such as stock trading software or manufacturing software call the analytics system for instant answers are also an excellent match.
A column-based analytics server may be architecturally different from a row-based DBMS, but from a management perspective, it's business as usual. They typically use mainstream front-end access tools, and standard ETL, standard servers and storage. And a column-based analytics server is much easier to maintain and requires less tuning than traditional DBMS.
The bottom line, therefore, is that a column-based analytics server has the potential to help companies with large data warehouses and complex analysis needs - a group that is growing daily due to mission-critical applications and the legal pressure on archiving and analyzing email and other unstructured data - control the growth of their IT energy consumption while providing unbeatable performance.