Follow us on #DataSummit
Data Summit 2024 is a unique conference that brings together IT practitioners and business stakeholders from all types of organizations. Featuring workshops, panel discussions, and provocative talks, attendees get a comprehensive educational experience designed to guide them through all of today’s key issues in data management and analysis. Whether your interests lie in the technical possibilities and challenges of new and emerging technologies or using Big Data for business intelligence, analytics, and other business strategies, we have something for you!
Access to all tracks including AI & Machine Learning Summit and Data Mesh and Data Fabric Boot Camp is included when you register for an All-Access Pass or Full Two-Day Conference Pass. Attendees may switch between tracks as they choose. Only interested in the two-day AI & Machine Learning Summit or our one-day Boot Camp? Stand-alone registration for this content is also available.
Tuesday, May 7: 9:00 a.m. - 12:00 p.m.
Located in Martha’s Vineyard A, Lobby Level
To develop and implement a successful data and analytics strategy, it is essential to understand the interdependencies required to enable data and analytics capabilities and deliver ongoing business impact. These interdependencies also include the skills and roles of everyone involved in working with data, such as business executives, business analysts, and data scientists. A practical road map focusing on being lean and significantly impacting the business is essential to measure and drive success. Attend this workshop to learn how to identify business drivers and convert them into analytic capabilities and data priorities. You will also learn how to create and execute a road map and deliver a compelling executive briefing.
John O'Brien, Principal Advisor & Industry Analyst, Radiant Advisors
Tuesday, May 7: 9:00 a.m. - 12:00 p.m.
Located in Martha’s Vineyard B, Lobby Level
Semantic layers stand out as a key approach to solving business problems for organizations grappling with the complexities of managing and understanding the meaning of their data. A semantic layer, also called a context layer, is a business representation of data that allows organizations to quickly map various data definitions, from multiple data sources to familiar business terms, offering a consistent and consolidated view of data. Join our workshop to gain insights into the foundations of semantic/context layers, their implementation, and the business value they provide by enhancing the utility of your data. The workshop promises an interactive experience, offering participants the opportunity to both understand the nuances of semantic/context layers and actively engage in constructing one.
Joseph Hilger, COO, Enterprise Knowledge, LLC
Sara Nash, Principal Consultant, Enterprise Knowledge LLC
Tuesday, May 7: 1:00 p.m. - 4:00 p.m.
Located in Martha’s Vineyard A, Lobby Level
In an information economy, data is the currency of business. Digital upstarts are disrupting every industry, making it imperative that organizations have a strong data strategy that turns data into insights and profitable activity. Organizations need data to streamline operations and reduce costs, improve decisions and plans, and grow revenues and profits. A data strategy is an enterprise-wide plan to harness data and analytics to achieve business goals. At a high level, it is a blueprint for creating a data-driven organization; at a low level, it is a set of blueprints for designing a data architecture to acquire, transform, and deliver data to business users and applications. Eckerson explains the keys to developing an actionable data strategy, how to build an executable road map, and how to create a data strategy that aligns with business needs based on an organization’s unique circumstances, culture, and data maturity.
Wayne Eckerson, President, Eckerson Group
Tuesday, May 7: 1:00 p.m. - 4:00 p.m.
Located in Martha’s Vineyard B, Lobby Level
Learn from an experienced developer how to use an open source vector database to power your GenAI chatbots, ecommerce recommenders, or similarity search-based apps. Learn the fundamentals of vector embeddings, vector indices, vector databases, and vector search, and gain hands-on experience using them in an application. The workshop starts with a brief introduction to neural networks, setting the stage for understanding vector embeddings. Next, we delve into the workings of vector indices and how to make informed choices among them. For the rest of the workshop, Bergman walks you through a practical project, embedding GitHub project documentation, storing vectors in Milvus, and conducting structured queries to answer questions based on the documentation. No prior experience required! Just bring your own laptop, with Python 3.10 or higher installed, and your favorite IDE such as VS Code.
Christy Bergman, Developer Advocate, Zilliz
Wednesday, May 8: 8:45 a.m. - 9:30 a.m.
IT and business executives frequently talk about information as one of their most important assets. But few behave as if it is. Even today, executives report on their financials, their customers, and their partnerships, but rarely the health of their data assets. And corporations typically exhibit greater discipline in managing and accounting for their office furniture than their data. The arrival of generative AI (GenAI) is sparking a discussion of how to adopt AI in measuring, monetizing, and managing data assets. Laney shares insights from his best-selling book, Infonomics, about how organizations can actually treat information as an actual enterprise asset. He discusses why data both is and isn’t an asset and property and what this means to organizations—particularly as they prepare to put AI to work broadly. He also covers well-honed approaches to and examples of organizations managing, monetizing, and measuring their data assets.
Doug Laney, Innovation Fellow, Data & Analytics Strategy, West Monroe and Author of "Infonomics" & "Data Juice", visiting professor at University of Illinois Gies College of Business
Wednesday, May 8: 9:30 a.m. - 9:45 a.m.
Jain and Das discuss how organizations should secure their AI application and the critical data they are feeding into these systems to ensure compliance and prevent damaging data leaks.
Dhruv Jain, Co-founder & Chief Product Officer, Acante
Abhishek Das, Co-founder & VP, Engineering, Acante
Wednesday, May 8: 9:45 a.m. - 10:00 a.m.
Learn how National Student Clearinghouse (NSC) created an operational MDM platform, giving access to a large volume of streamlined, high-quality data. With billions of records, a legacy IT system, and an enterprise focus on moving to the cloud, NSC focused on modernization for the cloud data ecosystem, adhering to compliance regulations and enhancing matching across the enterprise. Discover how NSC is now empowered with a single platform to support and facilitate customer requests with one source of truth while benefiting from a collaborative hub for data management and governance.
Felicia Perez, Managing Director of Information as a Product, National Student Clearinghouse
Wednesday, May 8: 10:45 a.m. - 11:45 a.m.
Important to an overall data strategy is management and governance of data assets.
Data products promise to deliver high-quality datasets to business users on demand, fostering greater trust in data and higher levels of empowerment and self-service. But many companies struggle to understand not only what data products are, but how to create, govern, and manage them. Thought leader Eckerson dives into the practical implications of running an organization using data products, describing how a data product is different from a data asset and how to create data products from data assets using a variety of tools and techniques. He addresses organizational, architectural, and process considerations for delivering data products at scale.
Wayne Eckerson, President, Eckerson Group
Wednesday, May 8: 12:00 p.m. - 12:45 p.m.
Looking at data strategy, a number of elements combine to inform strategic decisions, including repositories and data products.
A recent survey showed that 67% of companies had their software budgets cut during 2023. SaaS databases are easy to use and powerful, but they put a strain on budgets. Still, no one can afford to skimp on smart data analytics. How do you get more analytics out of your SaaS data warehouse/lakehouse, without spending more money? Treat incoming data streams as a graph. Relationships and categories of data can immediately be seen and acted upon. Duplicate entities can be resolved. Key pattern signals in noisy data streams can be pinpointed and the noise that you don’t need tossed out. By putting only relevant and clean data into analytical repositories, tons of useless data never have to be stored in pay-per-use systems, vastly reducing costs. You get smarter answers on clean, pre-filtered data in real time.
Paige Roberts, Director of Product Innovation, GridGain
In today's dynamic and data-centric business environment, organizations increasingly recognize the critical role of data products in extracting maximum value from their expansive data landscapes. This session explores what data products actually are beyond the buzzwords, why data products are becoming indispensable in data-driven business strategies, and what the best practices are for adopting data products. Join Denodo to better understand how data products can be a transformative approach in helping to democratize data access and revolutionize your decision-making processes.
Kevin Bohan, Director of Product Marketing, Denodo
Wednesday, May 8: 2:00 p.m. - 2:45 p.m.
As organizations concentrate on being data-driven, let’s not forget the importance of becoming insights-driven as well.
As the digital landscape evolves at an unprecedented pace, the ability to leverage data for strategic decision making has become essential for staying competitive and innovative. Thai provides a road map for harnessing the power of data and analytics to drive business success and examines the key components of becoming an insight-driven decision organization. Included are building robust data infrastructure, fostering a culture that values data literacy and insights, and implementing tools and technologies for data analytics and interpretation. Finally, he looks ahead at emerging trends and future possibilities in the realm of business insights and analytics.
Hugh Thai, Head, Innovation & Data Science, Arbella Insurance Group
Bill Morrissey, Sr. IT Manager, Arbella Insurance Group
Wednesday, May 8: 3:15 p.m. - 4:00 p.m.
Many elements of datasets need to be considered when creating data products that are effective.
Joseph Hilger, COO, Enterprise Knowledge, LLC
In today's digital landscape, data is key to decision making and planning. Sandhill Consultants, a certified Quest partner, leads in integrating data management practices. Our approach unifies data modeling, governance, catalogs, and operations and follows industry standards. Giles introduces the audience to the transformative potential of Quest IM Solutions for data management and showcases how the partnership not only accelerates the delivery of trusted data assets, but also optimizes business strategies through enhanced data governance, management, and utilization.
Jeffrey Giles, Principal Architect, Sandhill Consultants
Wednesday, May 8: 4:15 p.m. - 5:00 p.m.
Designing a pragmatic approach to competing on analytics relies on using strong analytic methods to get the most out of your data.
Drawing on her 25 years of experience with data science, Chase lays out a simple, effective way to get the right analysis to drive your effective data-driven plans. She guides us through the TAP method using simple, understandable, and engaging examples. She brings to life the method for measuring all the various types of data organizations look at in experience management. Gain practical knowledge to accurately determine metrics, along with a new way of looking at your data.
Chantel Wilson Chase, Director, Quality Analytics and Reporting, Alexion, AstraZeneca Rare Disease Business Unit
MySQL HeatWave is a service that not only speeds up analytics on data stored in MySQL database, but also allows users to run analytics on data stored in an object store. Sundara covers key features of MySQL HeatWave, including some of the machine learning-based automation features offered.
Seema Sundara, Technical Architect, MySQL HeatWave, Oracle
Wednesday, May 8: 5:00 p.m. - 6:00 p.m.
Wednesday, May 8: 10:45 a.m. - 11:45 a.m.
One important aspect, when moving to a modern data architecture, is an equally modern approach to data governance.
One of the challenges facing enterprise architecture is maintaining consistency across the enterprise. This is complicated by the fact that data comes from numerous disparate sources and systems that represent different purposes, focuses, and objectives. On top of that, there is considerable confusion as to terms such as data lake, data warehouse, and operational data store. To overcome these challenges at an enterprise level, a framework is necessary to apply data uniformly, consistently, and in a meaningful manner. The framework will transform data from multiple, disparate sources of data into an operational data store, a consistent, homogeneous environment, as the single dependable source of truth for all reporting and analytics.
Aaron Cutshall, Enterprise Data Architect, Healthfirst, Inc.
In today’s modern, data-rich environment, companies face the challenge of centralizing large volumes of diverse data to drive insights and operational efficiency. Without a comprehensive data centralization strategy, organizations risk missing out on significant opportunities for revenue growth and competitive advantages with data trapped in silos. By freeing data from silos and establishing a single source of truth, companies can accelerate their innovation to stay competitive.
Niraj Vora, Lead Sales Engineer, Fivetran
Wednesday, May 8: 12:00 p.m. - 12:45 p.m.
Real-time analytics contributes to building scalable and fault-tolerant data processing pipelines.
The combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines is extremely powerful, as demonstrated by this case study using the FLaNK-MTA project. The project leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making.
Timothy Spann, Principal Developer Advocate, Streaming, Cloudera and Future of Data meetup, startup grind, AI Camp
Wednesday, May 8: 2:00 p.m. - 2:45 p.m.
Finding needed data requires more than a user-friendly interface, it needs good metadata and innovative uses of LLMs.
In the contemporary data-driven landscape, businesses are inundated with vast amounts of data, necessitating sophisticated data management strategies. However, complexities arise in data management, particularly in large-scale environments. Key challenges include tracing data lineage, determining data freshness, identifying personally identifiable information (PII), and locating responsible data custodians, especially in scenarios where ownership is ambiguous due to staff turnover or lack of clear accountability. This presentation delves into the methodologies employed to integrate metadata into Acryl and explores the innovative use of large language models (LLMs) in responding to natural language queries about data. Knowledge graphs, in conjunction with LLMs, facilitate complex inquiries related to data discovery, thereby advancing our data discovering capabilities.
Ajinkya Tarkunde, Lead Software Engineer, Chime
Wednesday, May 8: 3:15 p.m. - 4:00 p.m.
Infosec teams and data teams are naturally at odds because they have competing agendas, but there are ways to meet the needs of both without compromising the requirements of either.
In today's digital world, the integration of data governance and data security is critical. Security threats continue to evolve, while the sources and end points of an organization’s data continue to grow exponentially. For organizations to gain rapid access to usable data, they must first prioritize fostering a healthy relationship between their data governance and infosec teams. The chief data officer and chief information security officer approach data with the same end goal in mind, but often with different tooling and systems. The rise of SaaS-based automation and simplified data tools are paving the way to unified security and governance efforts to provide a common language and framework for CISOs and CDOs to join together in a united force.
Matthew Wagnon, VP, Product, ALTR Solutions, Inc.
Wednesday, May 8: 4:15 p.m. - 5:00 p.m.
Ideas for saving time, enhancing data analytics, and adding business value begin with actual success stories.
Explore the potential of Apache Iceberg in the world of structured data. Uncover its unique features, including schema evolution and ACID transactions, making it an ideal solution for large-scale datasets. See how Apache Iceberg seamlessly fits into your data architecture, providing flexibility, scalability, and top-notch performance for analytics and data warehousing. Steinkamp shares real-world success stories where organizations have saved time and supercharged their business value with Apache Iceberg. Delve into how it enhances data relationships and analytics, making structured datasets more insightful. Get ready for an insightful exploration, where practical insights, success stories, and strategies for leveraging Apache Iceberg in structured data management and analytics are shared.
Zoe Steinkamp, Developer Advocate, InfluxData
Wednesday, May 8: 5:00 p.m. - 6:00 p.m.
Wednesday, May 8: 10:45 a.m. - 11:45 a.m.
Using data mesh to improve decision-making involves accelerating the adoption of many data elements.
The Department of Defense (DoD) initiated efforts toward advancing the agency's goal to improve decision making across all DoD entities. This goal rests under the foundational principle of accelerating DoD's adoption of data, analytics, and AI by prepositioning a common frame of reference for all DoD entities to converge and share data and AI models. Under the auspices of the DoD Chief Digital and Artificial Intelligence Office (CDAO), this effort will create an enterprise-level infrastructure of services intended to drive an integrated data, analytics, and AI strategy, while maturing a responsible DoD-wide AI ecosystem. This presentation highlights the case study on DoD’s efforts to establish a data mesh construct based on the following four elements: domain-oriented/decentralized data ownership and architecture, data as a product, self-service data infrastructure as a platform, and federated computational governance.
Efrain Rodriquez, Director, Business Intelligence and Metrics, U.S. Department of Defense (DoD)
Advancements in data automation, low-code/no-code platforms, and APIs make it quicker and easier for organizations to start their data fabric projects, often in just a few months. Learn how these advancements enable smoother integration and management of data across the enterprise, leading to faster decision making, efficiency, AI readiness, and increased profits. Discover how leveraging data automation can accelerate your data fabric strategy so your data is more effectively fueling significant growth and sparking innovation.
Mary Vue, VP, Marketing and Partnerships, Syncari
Wednesday, May 8: 12:00 p.m. - 12:45 p.m.
With the rise of generative AI (GenAI) and large language models (LLMs), the data fabric can add a range of new facilities to accelerate data democratization.
The data fabric architecture has been steadily gaining traction in the enterprise to unify data across disparate sources into coherent data services. By leveraging the power of GenAI models in conjunction with smart data fabrics, organizations can automate the integration of data, provide natural language access to data and analytics, improve data quality while decreasing the need for labor-intensive data cleansing, and secure and govern data in real time. Fried explores the benefits of using data fabrics and GenAI to improve data management practices and provides examples of how these technologies can be used in real-world scenarios. He also notes the risks and lays out a practical path for applying this technology safely.
Jeff Fried, Director, Platform Strategy & Innovation, InterSystems
Wednesday, May 8: 2:00 p.m. - 2:45 p.m.
Data mesh is evolving due to changes in data architecture and technological advances.
There is no doubt that data mesh principles resonate with so many data professionals, particularly those looking to move beyond brittle, monolithic architecture. However, adopting data mesh can seem daunting, due to both a scarce but improving ecosystem of tools, as well as organizational change management. Luckily, data mesh lends itself to evolutionary adoption, helping organizations to leverage existing platform investment and gain incremental value. Cordo reviews architectures and best practices from real-world experience, grounded by the stories of two organizations.
Elliott Cordo, CEO/Founder/Builder, Data Futures, LLC
Wednesday, May 8: 3:15 p.m. - 4:00 p.m.
New technologies can solve organizations’ operational problems.
In this session, Bagnall explores ETL's role in seamlessly integrating with data fabric architectures to empower organizations with the ability to efficiently manage, integrate, and analyze their data from diverse sources. He delves into real-world use cases, best practices, and the key features that make any ETL process a valuable ally in your journey toward a more agile and unified data ecosystem.
John Bagnall, Senior Product Manager, Matillion
Wednesday, May 8: 4:15 p.m. - 5:00 p.m.
Data mesh plays a pivotal role within modern cloud architecture, while a semantic layer acts as a cohesive force within the data mesh framework.
Data mesh is swiftly gaining traction as an innovative strategy for expediting data and analytics advancements. It achieves this by distributing data product development through domain-oriented, self-service methods. Crucial to the success of this approach is the emergence of the semantic layer, serving as a foundational catalyst supporting composable model design, enhanced collaboration, and decentralized ownership. This enlightening session delves into the integral role of the semantic layer within a contemporary analytics architecture, elucidating its interconnectedness with the data mesh concept.
Kieran O'Driscoll, Director of Business Development, Atscale
Wednesday, May 8: 5:00 p.m. - 6:00 p.m.
Wednesday, May 8: 10:45 a.m. - 11:45 a.m.
Generative AI (GenAI) is all the rage these days, but finding effective and realistic uses for it is still elusive.
The vast majority of current GenAI projects will fail, not because of inherent flaws in large language models (LLMs), but because of misconceptions about how to use them and the lack of capabilities needed to successfully design, develop, and operationalize GenAI-driven applications. Carlsson debunks the most harmful myths that set up projects for failure and looks at case studies of how advanced AI teams in industries ranging from pharma to food delivery are shattering these myths and delivering transformative outcomes.
Kjell Carlsson, Head of AI Strategy, Domino Data Labs
In this era where AI is reshaping industries, the integration of large language models (LLMs) like ChatGPT with private knowledge platforms is a groundbreaking development. Datavid shares experiences and lessons learned from both internal R&D and the benchmarking of several LLMs with customers and subsequent integration with existing KM platforms. Deep dive into the synergistic potential of combining the advanced natural language processing capabilities of LLMs with the rich, domain-specific data housed in private knowledge platforms. Come explore how this integration can revolutionize AI applications in your industry!
Clive Smith, Chief Revenue Officer, Sales, Datavid Limited
Tim Padilla, Director, Sales & Consulting North America, Datavid Limited
Wednesday, May 8: 12:00 p.m. - 12:45 p.m.
Chen explores the inherent connection among logistic regression, neural networks, and computer vision using mathematical structures as a lens. Drawing parallels between the construction of logistic regression functions and mathematical representations uncovers the foundational role of abstract mathematical concepts in shaping these methodologies. In logistic regression, the linear function, dynamically shaped by a combination of various features, emerges as a visual metaphor—a plane in the mathematical fabric. In neural networks, weights and nodes form a space surrounded by multidimensional planes, aligning closely with mathematical principles. In computer vision, filters function as weighted combinations of pixel features, extending the mathematical concept to image processing. This presentation illuminates the harmony and shared essence of mathematical principles across diverse machine learning and computer vision paradigms.
Liliang Chen, Financial Analytic Manager, Freddie Mac
In the rapidly evolving landscape of AI, the ability to efficiently handle and process vast amounts of complex data is paramount. Vector databases and vector search have emerged as critical components in this domain, offering a specialized approach to managing multidimensional datapoints, or vectors, that are essential for advanced AI applications. Agarwal gives a comprehensive exploration of vector databases, their role in AI solutions, and the emerging trends and technologies that are shaping their development.
Michael Agarwal, Director and Global Practice Leader, Site Reliability Engineering, Cloud & NoSQL Databases, Datavail
Wednesday, May 8: 2:00 p.m. - 2:45 p.m.
A well-known drawback to using generative AI (GenAI) is its tendency to produce false information.
A crucial aspect of constructing and applying GenAI for enterprise-level applications is mitigating hallucinations. The generation of factually inaccurate information can occur both during the initial development of large language models (LLM s)and the subsequent refinement of existing model responses through prompt engineering. Bhattacharya explores diverse approaches to mitigate these issues, including the introduction of new decoding strategies, optimizations based on knowledge graphs, the incorporation of innovative components in loss functions, and supervised fine-tuning. She also addresses methods such as retrieval augmentation, feedback-based strategies, and prompt tuning, which can be implemented during the prompt engineering phase.
Ranjeeta Bhattacharya, Senior Data Scientist, BNY Mellon
AI has the power to help your organization disrupt, innovate, generate faster insights, cut costs, and increase productivity. But responsible and successful AI use demands high-quality, trusted data and transparent, observed, and accessible data intelligence. See firsthand how taking a model-to-marketplace approach to managing and leveraging your organization's data can help you gain the footing needed to get the AI results you desire.
Yetkin Ozkucur, Director for Professional Services and Presales, Quest
Wednesday, May 8: 3:15 p.m. - 4:00 p.m.
The vector database has fast emerged as a preferred platform for GenAI applications.
While companies have long used vector databases to recognize patterns and support machine learning recommendation engines, now they are using them to support GenAI initiatives by storing, modeling, and searching tokenized data documents. Vector databases feed relevant content to language models (LMs), helping enrich prompts, fine-tune models, and govern outputs. Petrie defines vector databases and how they help companies boost productivity and gain competitive advantage with domain-specific GenAI initiatives. He looks at market requirements, adoption trends, challenges, benefits, use cases, and architectural approaches.
Kevin Petrie, VP Research, BARC
Wednesday, May 8: 4:15 p.m. - 5:00 p.m.
Models can be structured and designed in a variety of ways to enable them to provide valuable insights.
Time series analysis plays a crucial role in enhancing the capabilities of AI by providing valuable insights into temporal patterns, trends, and dependencies within datasets. Oad explores the synergies between time series analysis and AI, showcasing how the integration of temporal data can significantly improve the performance and accuracy of AI models. Key points to cover include temporal context in data, enhanced predictive modeling, improved anomaly detection, dynamic feature engineering, optimizing AI for time-varying data, forecasting and trend analysis.
Salochina Oad, ML Engineer, U.S.Xpress, Inc.
Wednesday, May 8: 5:00 p.m. - 6:00 p.m.
Thursday, May 9: 9:00 a.m. - 9:45 a.m.
Confronting the toughest data management challenge head-on, Rudden dissects the complexities of AI-driven versioning and presents a road map for navigating this intricate landscape. She delves into the strategic application of taxonomies and ontologies within the realm of graph modeling—heralding a new era of data structuring that boosts analytics, foresight, and decision making. Her approach provides attendees with the acumen to select, organize, and manage the right datasets, fortifying their data architecture against the rapid evolution of technology. Geared for a diverse array of data professionals, from strategists and scientists to engineers and BI experts, Rudden's insights are set to empower the audience with practical tools and methodologies. This keynote is your key to demystifying data management and embracing its future with confidence and expertise.
Beth Rudden, CEO and Chairwoman, Bast AI
Thursday, May 9: 9:45 a.m. - 10:00 a.m.
Today’s high-speed operational and AI-driven decision making requires ultra-fast analytics. Although typical data architectures are able to process streaming data, more often than not, the analytics are performed offline in batch mode. The real-time data is available for analysis, but the benefits of real time are lost the instant the data lands in a datastore or lakehouse for analysis. Ahuja delves into a modern data and analytics architecture—the Unified Real-Time Data Platform—that solves the real-time challenge. He shares details and use cases on how to process streaming data, enrich it with contextual historical data, and execute advanced analytical workloads—all at ultra-low latencies and massive scale.
Lalit Ahuja, Chief Technology Officer, GridGain Systems, Inc.
Thursday, May 9: 10:45 a.m. - 11:30 a.m.
Join our panel of industry experts as they discuss various aspects of data modernization. Representing a variety of approaches, the panel considers how emerging technologies, platforms, and architectures affect data management, organization, and analytics. What will the future bring? Our panel gives their opinions.
Sriram Vrinda, Director of Product Management, MySQL HeatWave and Cloud Observability, Oracle
Jason Russler, Technical Director of Alliances, VAST Data
Mark Kurtz, CEO, Neural Magic
Thursday, May 9: 11:45 a.m. - 12:30 p.m.
Data science is behind many different functions benefitting the enterprise.
Location data is a powerful tool. To help marketers understand and best meet holiday shoppers’ needs, Foursquare applied unsupervised learning methods to location data to derive meaningful segments of individuals based on their demographics and shopping behaviors. Rather than invest in reaching more general segments like moms or Millennials, marketers are now able to focus their efforts by targeting these data-driven segments. Dimensionality reduction methods such as principal component analysis (PCA), combined with clustering methods such as k-means can isolate which features describe the most variance among users. Those features can then be used to group like users together in an unsupervised manner to analyze results. This information empowers marketers to determine which segments present the largest opportunity and what strategies to use to best target them.
Ali Rossi, Senior Data Scientist, Foursquare
Thursday, May 9: 2:00 p.m. - 2:45 p.m.
Data is recognized as a critical asset for all organizations so leaders look to leverage generative AI (GenAI) capabilities.
GenAI cannot function without a bedrock of data, yet traditional data management is failing to meet businesses’ new demands, especially the need for real-time, consistent, and self-service data for applications, insights, and analytics. DataOps provides a set of strategies that allow organizations to harness data to enable solutions, develop data products, and activate data for business value across all technology tiers, from infrastructure to experience, evolving to help organizations move from analytics to real-time operational use cases such as GenAI.
Hershey Khan, Vice President, Data Strategy and Management, EXL
Thursday, May 9: 3:00 p.m. - 3:45 p.m.
Data analytics is rapidly evolving. Stay ahead of the curve with an understanding of the direction and evolution ahead.
Data exploration and preparation is core to gaining insights from data. In this session, attendees learn how to fast track exploration and preparation efforts. Using existing skill sets in SQL and cloud-native tools, you can accelerate your time to insights with the framework delivered in this session. Learn how GenAI and advancements in platform tools and automation will streamline and supercharge your data analytics efforts.
Mike Kahn, Manager, Field Engineering, Databricks
Thursday, May 9: 10:45 a.m. - 11:30 a.m.
Are you taking a hard look at your cloud costs this budgeting season? Is your IT budget getting eaten up without a good ROI to show for it? Do you need to justify your cloud expenses and business cases to leadership? Join Agarwal for a practical discussion on cloud cost optimization strategies, tips, and best practices. Learn about right-sizing your cloud resources, monitoring and managing your cloud usage effectively, identifying and eliminating wasteful spending, leveraging automation for better cost control, exploring cloud pricing models and modernization opportunities, and achieving the right balance of price and performance for your business requirements.
Michael Agarwal, Director and Global Practice Leader, Site Reliability Engineering, Cloud & NoSQL Databases, Datavail
Thursday, May 9: 11:45 a.m. - 12:30 p.m.
As cloud computing increasingly dominates the data world, it’s good to pay attention to products that are cloud-based.
For any modernization and digital transformation initiative to be successful in today's rapidly changing landscape, enterprises need a lean engineered product model that empowers the customers to easily build and manage cloud database products themselves meeting all enterprise control objectives. In this presentation, Roy discusses how a hyper-automation framework could be utilized for both database stateful and stateless automation, thus enabling full automation of all aspects of a product management cycle. This framework, in addition to database and analytical products, can also be applied for cloud data governance, data movement, and data enablement products.
Thursday, May 9: 2:00 p.m. - 2:45 p.m.
Key elements to consider when strategizing about putting data in the cloud include meeting business objectives.
It is a formidable task to ensure business-critical applications meet business service level agreements within the given RTO/RPO. This requires simplifying, standardizing, and automating the deployment and configuration along with providing HA and DR to these critical applications to meet business objectives. VMware multi cloud solutions, including VMware Cloud on AWS, Oracle Cloud VMware Solutions, and others, provide consistent and interoperable infrastructure and services between VMware-based data centers and the public cloud, which minimizes the complexity and associated risks of managing diverse environments.
Dean Bolton, Co-Founder, LicenseFortress
Michael Corey, Co-Founder/Chief Operating Officer, LicenseFortress
Thursday, May 9: 3:00 p.m. - 3:45 p.m.
When auditors come calling, it’s necessary to be prepared well ahead of time.
One year ago, Oracle dramatically changed how businesses can license Java moving forward. In effect, Oracle moved the goalposts such that companies can no longer license Java by the processor or named-user-plus model. Instead, Oracle now utilizes an employee-based licensing model in which every employee, full-time or part-time, must be licensed, regardless of whether they use Java. Unsurprisingly, Oracle’s definition of what constitutes an employee for licensing purposes is breathtakingly broad. Gartner has been quoted as saying 1 in 5 users will be audited in the next 3 years. With Oracle now aggressively pursuing companies for their Java usage, this session explores what you need to know when Oracle comes knocking on your door.
Dean Bolton, Co-Founder, LicenseFortress
Michael Corey, Co-Founder/Chief Operating Officer, LicenseFortress
Thursday, May 9: 10:45 a.m. - 11:30 a.m.
Move into the future with data management tips and techniques.
Developers in general are dealing with several challenges in managing data pipelines. Unpredictability and Inconsistency: Inconsistent tool usage leads to pipeline performance and reliability issues. Time to Market: Understanding of tools is needed for timely pipeline deployment while maintaining quality. Security: A least privilege model is required for strict access control to pipelines and data. Dependency Management: On-the-fly dependency installation causes compatibility issues and vulnerabilities. Scaling: Pipelines must scale to handle rapid data job submission without compromising performance or reliability. As MassMutual data pipelines move towards becoming centrally managed, owned, and versioned, the organization is working towards a secure, governed, reliable, and scalable run-time for all applications. Join Anand and Vijay as they talk about the Next-Gen Data Pipeline Solution at MassMutual and how they achieved a seamless, containerized solution for managing their workloads and execution environment.
Vijayakumar Gurusamy Raju, Lead Software Engineer, MassMutual
Anandakrishnan Ramakrishnan, Lead Software Engineer, MassMutual
Today's data-driven strategies for boosting revenue, reducing costs, and minimizing risk often begin with modernizing applications. Join this discussion to hear how companies like Visa, BNP Paribas, Standard Chartered, and Deutsche Bank gain the agility, scalability, and reliability they need through application modernization approaches.
Scott McMahon, Sr. Director Solution Architecture, Hazelcast
Thursday, May 9: 11:45 a.m. - 12:30 p.m.
One approach to digital transformation involves data analytics and real-time data.
Looking to transform your apps with real-time analytics? Kubernetes is an outstanding platform operating high-performance databases, and ClickHouse runs well on it. This talk begins with the basics of Kubernetes and introduces an operator that enables you to stand up ClickHouse clusters. Hodges walks through the installation process and brings up a ClickHouse cluster in real time. He then shows how running on Kubernetes enables emergent behavior like independent scaling of compute and storage, server fault tolerance, cross-AZ high availability, and rolling upgrades. You'll have enough guidance from this talk to start your journey to real-time data on a robust, cloud-native architecture.
Robert Hodges, CEO, Altinity
Thursday, May 9: 2:00 p.m. - 2:45 p.m.
Databases and DevOps do not stand in isolated silos.
If you think metrics is all you need to build proper observability and monitoring, Furmanek has a differing view. Although we have graphs, charts, and diagrams in place, we later learn that metrics are not enough. We don’t know how to configure thresholds, and we can’t troubleshoot issues when they appear. In this talk, Furmanek shows how to build proper observability and what’s needed in the modern world.
Adam Furmanek, DevRel, Metis
Thursday, May 9: 3:00 p.m. - 3:45 p.m.
A knowledge graph is commonly understood to be a knowledgebase that uses a graph database (a graph-structured data model) to integrate and link data in a form that is understood by both humans and machines.
Hedden explores all the components of an enterprise knowledge graph and provides further insight into the semantic layer or knowledge model component, which includes an ontology and controlled vocabularies, such as taxonomies, for controlled metadata. Further, she looks at the relationship knowledge graphs have with AI, including generative AI. While data experts tend to focus on the graph database components (RDF triple store or a label property graph), they should not overlook the importance of this semantic layer.
Heather Hedden, Taxonomy Consultant, Hedden Information Management and Author, The Accidental Taxonomist
Thursday, May 9: 10:45 a.m. - 11:30 a.m.
No more black box AI implementations—the technology needs to be ethical and explainable.
What datasets do you really need to be successful? The need is for consistent, clean, and curated datasets. Trusted data means acting on the data for critical business decisions. Bridging the gap between data and the real world empowers your data community to act on the data and provide monetary value from the data. How well is your organization providing trusted datasets to feed your AI and ChatGPT? How are datasets synthesized, scored, and shared? Find out how organizations can benefit from a data product, value scoring, and marketplace approach.
Bharath Vasudevan, Head of Product, erwin, Quest Software
Thursday, May 9: 11:45 a.m. - 12:30 p.m.
Take advantage of machine learning and NLP within the organization.
Generative AI (GenAI) is proving useful in the enterprise, but in many applications, it can't be used "off-the-shelf." For instance, deploying GenAI to answer business research questions from long text documents—primary and secondary market research reports, journal articles, thought leader white papers—requires several adaptations to make the process (and processing) efficient and effective. One of those adaptations is optimizing the document text with natural language processing (NLP) to accommodate the text capacity limitations of large language model APIs. Seuss explains and demonstrates how to use NLP to feed the GenAI only "summary worthy sentences" that are rich in meaning and help ensure the GenAI response is as accurate and meaningful as possible.
David Seuss, CEO, Northern Light
Thursday, May 9: 2:00 p.m. - 2:45 p.m.
Introducing AI into the enterprise is top of mind for many these days.
Probstein introduces the concept of metasearch as a transformative tool in the business world, akin to a master key unlocking various treasure chests. This analogy aptly describes the modern enterprise landscape, where numerous cloud-based applications, each with their unique datasets, are seamlessly accessible. He emphasizes the practicality of this approach, highlighting the efficiency of using metasearch over traditional methods that often involve heavy data amalgamation. By keeping the data in its original “chests” and using metasearch as the unifying tool, businesses can enjoy a more streamlined and agile data management process. The talk further delves into the synergy between metasearch and AI technologies like ChatGPT. AI, when applied to the rich and varied internal data of a company, can act as an intelligent guide, making sense of the vast information treasures. This approach not only simplifies data interaction, it also unlocks deeper insights, enhancing decision making and strategic planning.
Sid Probstein, Serial CTO & Creator, SWIRL
Thursday, May 9: 3:00 p.m. - 3:45 p.m.
Knowledge graphs are now independent entities capable of continuous self-improvement.
Recent advancements in large language models (LLMs) have spearheaded the development of self-sustaining knowledge graphs. Aasman focuses on four important aspects essential for knowledge graphs to autonomously synthesize and manage information: Intuitive Query Primitives, which allow effortless extraction of data from LLMs; Natural Language to Structured Query Translation, which translates natural language queries into structured queries across various languages; Integrated Vector Store, which facilitates seamless interactions between internal, private data and external, public data; and Neuro-Symbolic Framework, which synergizes rule-driven logic, constraint-based reasoning, description logic, Graph Neural Networks (GNN), Machine Learning, and LLM inferences. The presentation showcases practical applications.
Jans Aasman, CEO, Franz Inc
Thursday, May 9: 4:00 p.m. - 5:00 p.m.
Many companies have prioritized various data management trends this year to meet their increasing demands in data and AI initiatives. To help guide people, Radiant Advisors and Database Trends and Applications magazine conducted a market survey in Q1 2024, which analyzed what companies are doing beyond the hype. This survey focused on companies' perceptions, planning, and adoption of current data management practices, such as data fabric and active metadata, multi-domain master and reference data management, data quality, data observability, and data catalogs. The study covered a range of industries and company sizes. Following your participation at Data Summit 2024, you can compare your enlightened perspectives with the survey findings.
John O'Brien, Principal Advisor & Industry Analyst, Radiant Advisors