The issue of data silos has long hamstrung the efforts of data managers and their business counterparts to get an accurate, real-time picture of their enterprises. Now, a recent study suggests that open source technologies and data cloud platforms are helping to clear up these silos. Is this finally the long-sought, silo-busting holy grail?
A survey of 800 global organizations, conducted by IDC in partnership with Google, finds many data centers are moving to expansive cloud platforms, which is helping to reduce the obstacles posed by data silos. Within the next 3 years, 82% of organizations are looking to ensure that all capabilities supporting the full data and AI workflow are tightly integrated in their cloud data platform.
AI is driving increased activity within data environments. By 2025, at least 90% of new enterprise application releases will include embedded AI functionality, the study’s authors report. In addition, 80% are focusing on employing cloud resources to assist with the growing demand for embedded support for AI and machine learning models. Most, 73%, report their investments in data, analytics, and AI/ML during the past 18 months have paid off, improving delivery of actionable information to all users in workflows.
Skills are also a concern. Because most companies don’t have the data science staff they need to meet their AI/ML goals, more organizations are empowering “citizen data scientists” to develop ML models using pre-trained models or low-code training methods. At least 81% state that “having more citizen data scientists would substantially improve their ability to apply advanced analytics to more projects.”
External data is also a huge part of this equation, the study suggests. By 2026, 7 petabytes of data will be generated per second globally—but this overwhelming volume may not be as daunting as it sounds. “Only 10% of the data generated each year is original, while the remaining 90% is replicated,” the report states.
The survey also finds an embrace of open systems and open source to enable greater data movement and multi-cloud analytics. Seventy percent more data leaders than data laggards in the survey say it’s important for data clouds to be based on open protocols.
More than 70% of new apps will be developed on open source databases, the survey also shows. Another 80% of enterprises will be multi-cloud. There is also a “cloudification” of open source databases with fully managed services that is growing considerably in market size.
Open data is also increasingly part of the equation for enterprise data shops, the survey’s authors reported. Close to eight in 10 executives, 78%, believe that using external data is a critical competency for their enterprise. Tellingly, 75% are using location data across a broad range of business functions and processes, such as supply chain, public transportation, and personalized customer experiences. “Organizations are making use of the publicly available datasets such as weather, trend, and location data to extract valuable insights and develop revenue-generating applications,” the report’s authors indicate.
Not only are public datasets available on demand, “but they’re also free of administration and maintenance costs and vetted by the community for accuracy. Teams can also further expedite data pipeline development when they can access public data sets via open standards-based APIs and follow consistent standards for consumption and ingestion.”