Part 2: Know Your Data, Measure Your Data, Judge Your Data

In my first article, I outlined the three key components to good identity verification data and discussed how the “richness” of data is important to identity verification.  In this article we will cover the second key, “completeness”. In the end, it always comes down to metrics, and the success of using dynamic data is measured by your return on investments and ultimately your customer satisfaction scores. So leveraging data that falls short of complete data attributes, or coverage, will diminish your returns and negatively impact your customer experience.

We value and judge our data on completeness.

Completeness can mean a number of things, however, for us, the obvious measurement is coverage or “attribute fill rate”. This is basically the frequency for which we can provide that data attribute for a given entity. For example, how often we can provide age range for a person. Sounds simple, but completeness is tricky. Measurements of the completeness of entities and their relationships requires comparing our data set to the actual, ever-changing, real world. This can prove to be difficult, because, most of the time the only way to accomplish this is to gather statistics from external sources.

For example, what percentage of coverage do we have of the US adult population? Or what percentage of US mobile phones do we have linked to a person? We can’t begin to answer these questions, without utilizing external data sources, to know the actual population of US or total number of US mobile phones.  Let alone, be expected to measure the success of our data completeness when managing such enormous amounts of data.

Here are some numbers from the Whitepages Identity Graph™ that we track and measure the completeness of our data entities and relationships.

  • We have over 1.5B phone numbers worldwide
  • Our coverage is 96% for all US businesses
  • We have 99% of the US addresses
  • We are linking over 600 million person-to-phone relationships
  • There are over 2.7 billion unique person-to-address relationships

We constantly strive to improve completeness of data entities and relationships but not at the expense of accuracy. A big challenge is the comparison of the number of person entities we have to a given population. We know duplicates can get created in our graph, so by just adding more people, it will by this definition, boost our completeness KPI.  However, this will have a negative impact on accuracy, which is not helpful for identity verification of applicants, orders, and customers. We see this a lot from data providers where there is an inflated claim about the number of records complete, but it’s due to duplication, so it’s just plain garbage.

Since absolute certainty about an identity’s legitimacy is no longer possible to achieve with only PII data.  We use cutting edge data science to synthesize and corroborate our data from a variety of proprietary and non-proprietary sources to provide customers the best dynamic identity data. In order to mitigate risk, organizations need to re-evaluate how they verify the identity of customers, and having all key components of good identity data is a must.  We’ve covered richness and completeness, so stay tuned for the third key and final article on accuracy.

Interested in seeing our identity data in action?

Get a free demo of Whitepages Pro Identity Check™ from one of our experts to learn more.

Thanks for reading! You might be interested in these posts, too: