Personal University: Indexing Forms and DataRank

This post was originally published under the same title on the Personal blog, A Personal Stand.

The science behind making data reusable on forms

In our latest video for Personal University, we explain the new data science that powers Fill It and automated form filling. There are five primary components of Fill It and any data stored in a Data Vault:

1. Open Ontology

By necessity, we had to design and build an open ontology for different fields of structured data and meta-data – which number over 1,200 at present. Other efforts to better organize data informed our work, like the semantic web, but we had to stick a flag in the sand to make a user-centric system work. We have published the ontology, which you are free to adopt or help us improve upon. You can search and interact with the ontology, which features DataRank (next section) here.

2. DataRank

We analyzed hundreds of thousands of actual forms and millions of data fields and learned what fields were most commonly used. We call this “DataRank”. DataRank literally ranks every field of data we support according to popularity and frequency of use. We constantly add fields to the ontology and continue to refine our DataRank algorithm. Email us at forms@personal.com if you have data fields you’d like to be added.

3. Indexing Forms

We have indexed over 300,000 forms, which are all searchable and rated according to a 4-star system to help you understand how well the form supports auto-fill.

We start by matching supported fields from our ontology against specific fields on a form. A combination of machine learning and review ensure that matching occurs properly. Once matched, your data can be delivered securely from your vault to that field or a company’s back end in the required format. We call this our “semantic graph”. The video highlights several examples of how messy and complicated this problem is for autofill.

4. Structured Meta-data for Notes and Files

Personal’s ontology also supports structured meta-data for notes, descriptions, and other unstructured information and files – making all of those types of information and files reusable. This works especially well for frequently used information like company descriptions, personal bios, or step-by-step directions to your office or home.

5. Correlations Graph

Our last layer of logic, the Correlations Graph, is where some of the coolest magic happens. It analyzes the instances of data you and others use to complete a form — all while maintaining the privacy of your actual information.

For example, you may have a number of different names and addresses in your vault. The Correlations Graph figures out – without looking at the data itself – which name and address to use based on the context of the forms. It correlates which data belong together to help it predict with more accuracy the correct data to use in a form for a child, spouse, doctor, car, or anything else you’re managing in your vault. The Correlations Graph gets smarter the more you use Fill It.

Check out the video and let us know what you think.

Author: Shane Green

Empowering people with their data. CEO (US) digi.me. Co-founder & CEO of Personal (merged with digi.me) and TeamData. Previously co-founder & CEO of The Map Network (acquired by Nokia/NAVTEQ), Carnegie Endowment for International Peace. rshanegreen.com and @shanegreen View all posts by Shane Green