Building on its $300 million investment in Apache Spark a year ago, IBM is announcing new initiatives including an environment where data scientists can learn, create and collaborate with peers using whatever data science language they prefer, donation of new open source technologies that help bring Spark to members of the R community, and an open ecosystem of partners to help expand innovation.
IBM’s new integrated development environment for real-time, high performance analytics, available on IBM Cloud, called the Data Science Experience, is aimed at helping to blend emerging data technologies and machine learning into existing architectures.
According to IBM, the Data Science Experience is a first-of-a kind cloud development environment to help data scientists break down language barriers by consolidating multiple open source tools and enabling them to collaborate with others to get applications into production faster. It brings people together along with content, data, models, and open source tools from IBM and others including H20 Libraries, RStudio, Python and Notebooks in a single environment.
Data Science Experience includes capabilities of IBM’s Data Scientist Workbench - which has more than 7,000 reistered users to date. This includes built-in connectivity to multiple data sources, big data integrated data sourcing and refining, and access to big data engines.
IBM is also extending the speed and agility of Spark to more than two million members of the R community through new contributions to Spark R and SystemML, enabling data scientists who work in R to be able to apply new insights faster and help drive analytics across the business.
“With Apache Spark, we see an opportunity to drive significant innovation into the community to benefit data engineers, data scientists, and application developers,” said Bob Picciano, SVP, IBM Analytics. “Our IBM Analytics platform is designed for blending those new technologies and solutions into existing architectures.
IBM says it has built Spark into the core of more than 50 of its analytics and commerce platforms including IBM BigInsights for Apache Hadoop, IBM Analytics on Apache Spark, Spark with Power Systems, Watson Analytics, SPSS Modeler and IBM Stream Computing. IBM also open-sourced its IBM SystemML machine learning technology to advance Spark’s machine learning capabilities.