Machine learning is revolutionizing the process of complex decision-making by enabling the analysis of bigger, more complex datasets and the delivery of faster, more accurate results.
Although the technology is developing rapidly, many projects are still in their early phases while others have hit a wall because they can’t keep up with the volume and variety of data. From selecting data sets and data platforms, to architecting and optimizing data pipelines, to evaluating commercial and open source frameworks, there are many success factors to keep in mind.
At Data Summit 2022, Charna Parkey, VP of product, Kaskada presented “The Basics of Machine Learning” during her workshop session.
The annual Data Summit conference returned in-person to Boston, May 17-18, 2022, with pre-conference workshops on May 16.
Most machine learning (ML) models are trained over examples collected at different points in time, and often are trained to predict the future. Machine learning needs the ability to forget—to learn what’s relevant now.
Parkey used mobile gaming as an example. The system behind free-to-play gaming needs to find and retain customers to turn a profit. However, it turns out that only 5% of players that begin a game are active several months down the line. So, the developer of the game needs to come up with a way to keep people engaged and predict user behavior. That’s where machine learning models come in.
Amazon relies on a ML model for personalization, Netflix relies on ML for recommendations, and FICO relies on ML for risk modeling, Parkey said.
Event-based data is structured data at a certain point in time. It can give context to behavior and behavior changes, she explained.
The first step in using ML within an organization is to understand what’s possible with data inside the business. High quality models need high quality features. But the typical process for understanding and learning important behavior triggers is complex, manual, and frustrating, she said.
Kaskada offers iterative time-based feature engineering within machine learning. Feature engines support data science and production, she explained.
“We have to support experimentation,” Parkey said.
The goal is to create behavioral machine learning that can pay attention to a users’ behavior and the context, so an outcome can be predicted. The first key feature has to define what to compute. Then time selection defines when to compute.
“The data itself contains information about the future and what may happen,” Parkey said.
Dr. Brian Godsey, data science lead, Kaskada, led attendees through an exercise examining machine learning models based on event data.
Many Data Summit 2022 presentations are available for review at https://www.dbta.com/DataSummit/2022/Presentations.aspx.