The $200B Cost Every Business Sweeps Under the Rug

Companies across all categories develop people processes or automated algorithms using static data sets to verify consumer identities, and in many cases, this data is light years behind one of the most sophisticated industries on the planet: fraud. Being two steps behind the bad actors leads most executives to settle for “acceptable fraud rates” that equate to a total annual cost to merchants of 190 Billion+ (2011).

Understandably, business leaders have taken note of this high cost and are looking for the world’s best data scientists to help solve the problem. How do data scientists and huge, dynamic databases – like Whitepages Pro’s 5B record Identity Graph – help reduce the cost of fraud?

By sourcing as much Rich, Complete, and Accurate identity data as they can, data scientists are able to build algorithms that process hundreds of data elements every second in order to make decisions about the veracity of a consumer’s identity. (Interested in learning more about the science? Check out Dr. Steve Hanks’ interview with Forbes). With the advent of fast, RESTful APIs, cloud computing, and new models for storing enormous data sets, a new world of algorithmic identity verification using dynamic, non-personally identifiable information (non-PII) offers an elegant model for staying one step ahead of complex fraud attacks.

hammertimeHistorically, companies used Rules Based Logic at the core of their Identity Data processing models, (i.e. IF…THEN statements, ELSE, WHEN, etc.) to build an algorithm through which identity data was funneled stepwise. Visually, we’ve all seen comical flowcharts that address some decision that needs to be made. These charts, like the one at right, are a good visual example of how a Rules Based system works:

Banks may use logic like this when deciding whether or not to process a pricey credit check for an application; does name match phone number? Does email match name or address? Does the IP address from which the application came match the address given, or is it within X number of miles?

There is a series of meta-decisions that must be made during the creation of the logic tree that decides the tipping point of some binary yes/no concept.  In the example depicted above, we knew that if it was ‘not Hammertime’ it was time to ‘Collaborate’ and then later time to ‘Listen.’
But what feeds THOSE decisions when it’s not an obvious binary choice or when there must be some statistical tipping point at which you go with Collaborate vs. Hammertime?

Enter Machine Learning Algorithms.

To fully explain a Machine Learning Algorithm over a beer would do a disservice to the genius and complexity of the engineers who build them. However, the gist is quite accessible.  In general, a Machine Learning process means that a model, or set of rules, ingests data to fine tune the logic of those rules. The model constantly ingests data and self-adjusts its own model relative to the new, statistical relevance and ‘learns’ with the steady stream of new data.

‘Supervised Learning’ algorithms actually look and feel a lot like ‘Rules Based’ systems, but differ in that the tunings – the decisions we make when we Stop and choose at each step in the flow to either Collaborate or Hammertime – are self-defined by running enough data through the model until the desired tuning has been achieved using a sufficient data set.

In Reinforcement Learning algorithms, for example, the machine is trained to make certain decisions.  Using trial and error based on a steady flow of new information, the machine looks at its aggregated experience in tandem with as much new raw input as possible to make an accurate, statistically relevant decision. Rinse and repeat forever. So adding new, interesting data elements to feed the model gives ‘lift’ to the performance of your model when tested against results that were previously passed through.

Whitepages Pro’s largest API response from our Identity Check 5-in-1 query returns 50+ data elements: phone metadata, email metadata, address metadata, matches between the elements – etc. It is common for folks with Machine Learning systems to say “go ahead, give me all you’ve got. I’ll test the lift.” For Rules-Based models that rely on the binary yes/no of a decision tree, the ability to ingest mass data is quite different. Teams using static Rules-Based models may say, “you, know, THAT one flag is the one I’m interested for this piece of my flow.” – or – “just THOSE three fields are what I want to key off of in my model,” rather than the valuable broader set of records and metadata digestible by a Machine Learning algorithm.

When it comes to your business, you can’t afford to accept static models that aren’t being calibrated with relevant data on a regular basis.  Leveraging dynamic, non-PII data to build rules, or tweak and regularly feed your Machine, keeps your company processes up-to-speed, protected, and efficient in the face of an ever-changing landscape of identity, fraud, theft and noise. Let’s roll back that $200B number one consumer at a time.

We need to STOP using static data and COLLABORATE with Whitepages Pro to LISTEN to the dynamic data as we build the best possible Identity solutions. Or, you know, Hammertime – up to you.

Interested in learning more about Whitepages Pro’s API data solutions? Check out the Developer Center.


Thanks for reading! You might be interested in these posts, too: