Article

Part 1: Know Your Data, Measure Your Data, Judge Your Data

Organizations want absolute proof of identity but must learn to live with shades of uncertainty. Fraud, security, and business managers need a layered identity assessment approach that relies more on dynamic data and less on static, regulated PII information.

Whitepages does data differently. Our approach to sourcing, aggregating, and synthesizing identity data is what makes the Whitepages Identity Graph™ unique. One of the questions we get most often is, “how good is your data?”  And that’s a great question. We ask the same question to vendors and it is always surprising to me that the answer – and even the meaning of it – is not often clear. It is even more shocking when a data vendor doesn’t have a good answer to the question.

From the beginning, we have been committed to improving and adding value to our data constantly. This means we need to have an answer to what “good data” means, measure how good it is, and constantly try to make it even better. In that order!

We measure our data on three criteria:  data richness, completeness, and accuracy. For each it is a challenge to define what the term means and how to measure it in a precise way. I’ll address each one of these criteria in my blog series.

You too should be measuring data quality in some way. If you’re buying data, you should always ask the question and expect a good answer. Be very nervous if you don’t get one. If you’re selling data you should anticipate this question; even if a customer doesn’t ask, at least you’ll know you’re making your data better.

We organize our data in a graph structured database and manage over 5 billion global records in real-time.  This allows us to create 1 million new links a day within the Identity Graph, in order to constantly improve the quality of our data.

Let’s explore my first criteria.

Richness

This is pretty easy to assess – what kind of data you have, what attributes, etc. Since we use a graph structure database we can ask about the richness of the entities, links, and attributes. Data richness is important both because it inherently adds value, but also because it creates more opportunities to draw insights and value from the data. Data elements we have include:

Entities

  • People, businesses, addresses, phone numbers, emails, social profiles, URLs

Links (Relationships)

  • Person-to-address, phone-to-person, person-to-historical, person-to-email, person-to-phone

Attributes

  • Phone Data – 7 line types, 3,000+ carriers, pre-paid, SMS capable, etc
  • Person Data – name, sex, age range, related persons, etc
  • Address Data – 30+ years of address history, receiving mail, geolocation
  • Email Data – registered name, first seen, auto-generated, etc
  • Relational Data – start and stop date, type of relationship

Having rich data is useful for companies doing identity verification. For example, if we need to answer whether two people with similar names are indeed the same person, having good date of birth and sex information is important. When processing a potentially fraudulent transaction or assessing a lending application, additional attributes about the users are crucial to make an informed decision. Data is a powerful ally when it’s good.

Stay tuned, our next article in this series will focus on the completeness of data or learn more about our data story.

 

Thanks for reading! You might be interested in these posts, too: