Article

Four Lessons from Building Products Using Machine Learning

In today’s world, machine learning has become ubiquitous, solving complex daily problems by recommending media, estimating wait times and even translating languages. It is undoubtedly a powerful tool, providing exciting experiences and saving business and consumers time and money.

At Whitepages, we use machine learning to build innovative product offerings such as the Identity Check Confidence Score—and we’ve learned some important lessons along the way. From my experience, companies interested in using machine learning to build successful products should keep the following in mind.

Understand the problem you want to solve with machine learning

While this might sound obvious, taking the time to really understand and articulate the problem and the surrounding context is a critical first step. Below are some important questions that should be understood:

  • What is the business problem that your customers are facing?
  • What options are currently available to solve the problem?
  • How accurate does your solution need to be?
  • What is the value of solving the problem?
  • What is the cost of a wrong answer?

Investing time on this front will pay dividends when you have to make decisions such as model selection, feature engineering, and training data requirements. Articulating a strong customer need and business problem orientation provides a useful framework to evaluate such choices and make the right tradeoffs. For example, at Whitepages Pro, we spend hours understanding fraud patterns, rules that are used to catch chargebacks and reduce reviews, and the economics of such workflows. This allows us to create a product that not only reliably flags fraudulent transactions, but saves our customers time and money in implementation.

Ensure you have quality data feeding your models

A machine learning model is only as good as the data that it ingests—garbage in is truly garbage out. Poor data runs the risk of confusing your model and optimizing it toward the wrong objectives. Furthermore, poor data complicates performance analysis because you’ll be left wondering if the root cause of an issue is the model or the underlying data.

To address this, provide documentation and setup processes that outline and check data definitions and standards. This is particularly useful if multiple internal teams and/or customers are part of the data generation pipeline.

Machine learning is fast; analysis is slow

Modern day machine learning models are amazing in that they can process millions of transactions with hundreds of variables quickly. However, it still takes time to truly understand the variables and the results, and how they relate to the real world behavior that the model represents. Such work is relatively slow and painstaking, but necessary.

The goal should be to develop an intuitive feel for the model. Do the results make sense? Are they in line with general trends and beliefs in this industry? I’ve found it useful at this juncture to get a gut check from team members who are not in the weeds of building and optimizing the model. While building our Confidence Score we shared the results of each model iteration with our sales engineering team, who work on the frontlines with our customers. This provided a lot of direction and assurance that we were on the right track.

Employ both statistical and business metrics to measure success

As you develop and iterate the machine learning solution, you will need to decide at what point it is good enough to launch. This question is often complicated not only because there is always room for improvement and optimization, but also because there are an extensive number of metrics that could be used to judge performance.

What I’ve found effective is to pick a couple of key statistical or machine learning metrics and use them to evaluate performance of the solution against a core business metric. For example, we use metrics built off ROC curves such as AUC, Precision and Recall to evaluate the predictive power of our models. As a next step, we measure the incremental business value the model provides our customers (in reviews and/or chargebacks saved as well as net monthly dollars) and use that to decide if our models are ready to be deployed. With the Confidence Score we leveraged our own consumer facing business, Whitepages Premium, so that we had a rapid feedback cycle and real world proof of success.

I hope these lessons prove valuable as you evaluate the value of machine learning for your business. See our machine learning applied in our Identity Check Confidence Score.

Thanks for reading! You might be interested in these posts, too: