Article

5 Things to Remember when Using Identity Check in a Machine Learning Model

As a risk-modeling data scientist, you are the first line of defense in the battle against fraud. You hold the gate. It’s you who decides who gets welcomed inside, and who has to brave the alligators in the moat. Unfortunately, instead of quizzing intruders and delivering Monty Python-esque insults, you have to evaluate a new potential vendor. Are they offering the magic truth spell of your dreams, or just another hunchbacked oracle.

Here at Whitepages Pro, we understand your concerns and the importance of your time. We want to help you quickly understand if and how Identity Check API can help your business, so you can get back to your real task – cackling with glee while pouring boiling oil from the battlements. To that end, here are five things to remember as you evaluate the Identity Check API for your risk modeling:

1. There are options for every latency budget
Depending on your situation, you may not have time to inquire into everyone’s elderberry-scented parents, nor the seemingly anachronistic coconuts they carry. In other words, you may only be able to tolerate so much latency in 3rd party API calls. Rest assured that no matter what your latency requirements are, Whitepages Pro has a solution to fit. But be sure to ask us first, so we make sure you evaluate the appropriate product configuration.

2. Whitepages Pro data is real-time
It’s no use asking the wrinkled old fortune-teller what she would have recommended a year ago. Similarly, Whitepages Pro Identity Check data is all real-time. We can tell you who owns this phone today, if this is a real email today, etc., but cannot tell you what we would have said a year ago. As such, it’s recommended to evaluate us on relatively recent transactions, as data over a year old may introduce a bias.

3. Tree-based models work best
A rag-clothed orphan is low risk, a merchant with a wagonload of goods is low-risk, but a rag-clothed orphan with a wagonload of goods is quite suspicious. Similarly, within Identity Check data it is often the combination of different signals that indicate particularly high or low risk. Tree-based models tend to naturally identify these combinations of signals, whereas techniques like linear or logistical regression require more feature engineering to provide the same value. That said, Identity Check Confidence Score is based on the entire Identity Check response, and so it provides a valuable one-stop shop for identity verification in all types of models, including regression.

4. Future-proof against evolving fraud
Asking your guards to watch out for a green-cloaked scoundrel with a conspicuous mole may be a great idea today if that specific guy keeps trying to get in, but this will probably not be relevant a year from now, or even a few months from now. It’s the same with fraud modeling, where the type of fraud you see today may not be the same you see tomorrow. We recommend evaluating our data on as broad a data set as possible to avoid overfitting, and to frequently retrain to stay on top of changing fraud trends.

5. Your time is precious
You’re the master of the gate, and you can distinguish friendly visitors from hostile intruders better than anyone. That said, you only have so much time in your day to spend figuring out how some new piece of equipment works, and so hearing out the maker can save you a lot of time and effort. In the same way, we know you have a limited time budget for evaluation, and so our goal here at Whitepages Pro is to make the evaluation process as quick and easy as possible. We’re here to help with any questions or analysis we can provide – you can contact us here.

Thanks for reading! You might be interested in these posts, too: