RainStor, a provider of big data management software, has unveiled the RainStor Big Data Analytics on Hadoop, which the company describes as the first enterprise database running natively on Hadoop. It is intended to enable faster analytics on multi-structured data without the need to move data out of the Hadoop Distributed File System (HDFS) environment.
The product combines up to 40x compression - along with fast analytics by providing both SQL access and MapReduce. The compressed multi-structured data set running on HDFS reduces the cluster size by 50-80%, in turn, lowering operating cost.
According to RainStor, the new product brings added value to new and existing Hadoop deployments with proven database technology and enterprise-grade features. Because RainStor provides high compression, data is reduced by up to 40x (97.5%) or more compared to raw data, and requires no re-inflation when accessed. Faster analytics are achieved when this compression is combined with dynamic filtering at file, column and row level, resulting in higher productivity from more efficient use of the Hadoop cluster. Compression, combined with enterprise database management features, also reduces storage and cluster size for lower operating costs.
"A lot of times, organizations will start out a Hadoop deployment, ingest network data, analyze it, and then perhaps send the results or some of the data sets from that Hadoop environment into their data warehouse and then conduct further analytics, mashing it up with some existing structured data," says Deirdre Mahon, vice president of marketing at Rainstor."We are saying that you don't have to pipe the data out. You can actually conduct all of your analysis of both structured and unstructured data within your Hadoop environment, having Rainstor run natively therein."
There is architectural compatibility with the way Rainstor manages data and the way Hadoop Distributed File Systems manage CSV files, says Mahon. "We manage the data in large simple objects or large blocks and that is exactly what Hadoop Distributed Files are - they are large blocks of data. We call them partitions within the Rainstor database - but that is one of the main reasons why we can reside native on the actual Hadoop cluster."
Additionally when an organization implements a Hadoop deployment, it has to train IT staff on how to use MapReduce because that is the method for batch analytics across the multi-structured data, says Mahon. "If you have both SQL and MapReduce, you may not have to train everyone to know how to use MapReduce. Some of your people can get up to speed on that, but meanwhile you can still access the data and run SQL queries and existing SQL statements that you have, and as a result, we feel there is operational cost impacted because of resource training."
In addition to the new RainStor Big Data Analytics on Hadoop product, RainStor also announced partnerships with Cloudera, Hortonworks and MapR, distributors of open source Apache Hadoop. "We are agnostic," says Mahon, explaining that customers can take their pick in terms of which Hadoop deployment they want to use. "And hopefully, they will deploy us at the same time and get the significant cost savings impact."
The offering is being branded as a new product rather than a new version of the Rainstor database that was last updated in March 2011, Mahon says. Because this new product supports MapReduce via Pig as well as standard SQL access, the company decided it would be better and less confusing to put out Big Data Analytics on Hadoop as a new product, she explains.
For more information about RainStor Big Data Analytics on Hadoop, visit www.rainstor.com.