Today, the idea of picking up the phone to call a travel agent to plan out a vacation seems archaic. It used to be that everyone used a travel agent, but nowadays it’s an occupation that caters mainly to the wealthy. Yet there was something about it that was uniquely valuable, something that the legions of vacation-planning websites and travel-booking apps—powered by oceans of data and cutting-edge algorithms—haven’t been able to fully replicate.
Most of us aren’t lucky enough to plan vacations for ourselves every waking hour of the day. We might do it once or twice a year. Travel agents, by contrast, do this all the time. And because they make travel arrangements so frequently, they naturally develop heuristics (i.e., shortcuts) and gain skills for doing it well.
Experienced travel agents can plan out a trip across Europe very efficiently by stitching together data on ticket prices, transfers, local attractions, and hotel availabilities. They combine this with the wisdom they’ve accumulated from years of planning similar trips and then produce itineraries that have been expertly mapped out.
Vacation planning websites and travel booking apps have disintermediated the travel agent by delivering the information that they relied on directly into the hands of the masses. Now we all have as much information as travel agents once had (or perhaps even more), but what we lack is their accumulated wisdom. As a result, we’re left to assemble our itineraries as best we can, with all of the wasted hours and suboptimal routes that entails.
The Age of the Citizen Data Scientist
In the business world, we’ve seen a similar transition happen in the realm of analytics. We are living in the age of the citizen data scientist—empowering a wide swath of employees to follow their curiosity and plumb the depths of their organizations’ data lakes using sophisticated machine learning algorithms.
This is great! The democratization of analytics capabilities has led to a much more rapid pace of analytic discovery within companies. But it also creates new challenges.
Machine learning is excellent at uncovering new information by discovering new correlations within the data. However, it is not able to fully understand the relative value of those correlations and the trade-offs between the costs associated with operationalizing the model and the lift that are being achieved. Machine learning AI lacks the wisdom to understand the trade-offs inherent in translating analytic insights into real-world decisions. This understanding can also lack the rigor to ensure that the model is ethical in its use of the data.
A Practitioner’s Guide to Analytic Model Development
When developing new analytic models, what’s needed is the combination of information (produced efficiently by machine learning algorithms), explainability (also provided by technology), and wisdom (provided by experienced data scientists and decision management experts).
Many organizations specialize in helping companies do exactly that. In most situations, any good model development follows these steps:
Step 1. Start with the problem.
You can’t identify valuable datasets until you have clearly defined what problem they will be valuable in helping you solve. This brings us back to our travel agent paradigm—make sure that your citizen data scientist understands the business context of what is being modeled.
Step 2. Define the behavior you are trying to predict.
For example, if a bank is trying to predict attrition, it will need to define exactly what types of customer behavior constitute attrition. This may include information such as the account closing, becoming inactive, the account balance dropping below a certain level, or account spending dropping blow a specified threshold.
Step 3. Evaluate all potentially relevant data sources.
Once you have defined what you are trying to predict, you will want to be fairly liberal in terms of the different types of data you consider. Machine learning is enormously helpful at this stage because it can automatically evaluate data, such as the following:
- Account data
- Transactional data—recency, frequency, transaction type, volume
- Call center data
- Collections data
Step 4. Plan for implementation.
Assess the implementation environment for limitations on data availability and model type compatibility. Understand what it will take in terms of time and effort to fill any gaps that may exist. Additionally, any approvals or governance processes for model signoff should be well understood at this stage.
Step 5. Wrangle the data.
Once you have the confidence that the data in question will deliver significant predictive value for a key business problem, you can then invest the necessary time and resources into wrangling that data and getting it ready for use in production systems. A key consideration at this step will be to understand and address any data biases that naturally reside in the data or that may be unintentionally manufactured through sampling.
Step 6. Build models.
After you have wrangled the relevant data, you’ll then need to apply the right analytic techniques (e.g., feature generation, variable reduction, random forest, etc.) to build models and see how predictive that data is for the different behaviors you are focused on. Go back to Step 1 and make sure that the model itself is using the underlying data in an appropriate, ethical, and efficient way. If you don’t get significant predictive power for one of the data sources, don’t include it because it will make the next step (Step 7) unnecessarily difficult.
Step 7. Operationalize your model.
When the data is ready for use in production, the model can be incorporated into your organization’s existing decision strategies and business rules, and then operationalized for use in making better decisions.
Good Model Development
Many organizations are adept at helping companies implement new analytic models; what’s needed is that combination of information, explainability, and wisdom—bringing expertise in building and operationalizing analytics models to bear. Any good model development will likely follow these seven steps.