From Homeless at 20 to Becoming a Leader in Entity Resolution:
Interview with Senzing Founder and Chief Scientist Jeff Jonas
Can you provide some background about your upbringing?
I was raised in Northern California, and I was not really great at school. I was a beach bum for a while. I started playing guitar when I was around 11 to give me something productive to do. My mom introduced me to computers when I was 14 and then I became obsessed and had a real purpose.
2. I see that the Ironman triathlon is a very impressive and unique hobby of yours. Can you describe your path to being an athlete at these triathlons, and any more casual hobbies that you enjoy?
When I was 31 my mom talked me into doing a marathon with her, and I trained for it for five weeks, and that was how I got into athletics. I did a short triathlon with friends, and then I started doing the full-distance Ironman triathlon.
Soon enough, I started doing triathlons around the world and one day I suddenly realized I was very close to doing every single Ironman in the world. That happened, and today, I am currently the only American to do that. Other than that, my job is my hobby because I love my work.
3. Why did you choose to pursue a career in data science?
It was not really called data science at the time. I just loved writing software. Back in the day, I was just building data management systems. I learned to love data only after working on a hundred different software projects, and every project had different kinds of data.
4. What made you want to be an entrepreneur?
I was too young and inexperienced to be employed as a programmer, so the only way to really work on software was to start my own business. My first company was a custom software consultancy. After a couple of years we had just over 20 employees before, it went bankrupt when I was 20 years old. I was homeless, bankrupt, and living in my car at 20.
I figured out that the reason that custom software was so hard to build was that people were building the software without a blueprint. So I decided to focus my software development to blueprints first, so I could really understand my roadmap before I started, and that led to some really successful projects.
I figured out how to analyze people's data (matching records) in a way that would reduce the risk that the data would be stolen, and today, that method is used to help modernize voter registration in America. It has done more for voter registration in America than many other projects. The fact that over half the country benefits from this system makes me feel that I made an impact.
6. Which of your 11 publications are you most proud of?
I did do a chapter in a Tim O’ Reilly book called Beautiful Data, in a chapter called “Data Finds Data.” The chapter talked about how many systems wait for humans to ask the questions, but that means people would have to think of and ask smart questions everyday. I envision a different future where data finds data, and things that are relevant find you.
When we talk about “Data Finds Data, Relevance Finds You,” one question might be what is relevant? Sometimes such systems can be pre-coded. For example, William and Bill are synonyms.
In other cases, some things can be learned by machines over time. A problem with many machine learning methods is that you have to periodically retrain them which can be time consuming and can lead to late discoveries. Because of this, I have been more interested in systems that do real time learning.
7. What was your favorite project as a Chief Scientist of Context Computing at IBM?
It would still be my voter registration project. My next favorite was helping the Singapore triage risk in the Malacca Strait, which half the world’s supplies and one third of the world’s commodities go through. They wanted to figure out which vessel on a given day was the most interesting to visit.
8. What has been your biggest challenge in your career? What advice would you give to yourself at your first job?
I think persistence really matters. When I went bankrupt at 20, I still wanted to be a computer programmer, even though I never finished high school. When I was 23, I broke my neck and became paralyzed, but I still wanted to be a computer programmer even if it meant attaching a stick to my nose to type. This is persistence. One lesson I learned along the way was: if you are not winning, you should be learning.
9. Why did you start Senzing? How did it emerge from the IBM G2 team?
I worked for myself my whole life, until IBM bought my company in 2005. I was 41 back then. I worked for IBM for eleven and a half years, instead of just retiring after selling my company.
At Senzing, we started building a new type of AI that could learn in real time and change its mind about the past in real time. The project was code-named G2, which in 2016 became a one-of-a-kind IBM spinout. As such, Senzing is not much of a startup; it was more of a reincarnation, being it only has a new birthdate and a new name.
10. Senzing is “focused on democratizing entity resolution.” How can you explain entity resolution for a person without that much understanding of data science?
An entity could be a person or a company, or a plane or a boat. To resolve them means to determine their similarity.
In your address book, if there are duplicates for contacts, that is an entity resolution problem. Banks with hundreds of agents accidentally add new customers without realizing it might be the same person they cancelled last month for money laundering!
Entity resolution is all about matching fuzzy data, and at Senzing we figured out a means of making the process easy, accurate, and fast. We also made it affordable, so no one has to try to build it as they can download it and try it for free.
11. What impacts has entity resolution had in financial services, the public sector, and information services?
Financial services often have an obligation to make sure they are not transacting with people on watch lists. Entity resolution allows them to reduce their false positives, screen their millions of transactions, and save a lot of money. Right now, they have analysts who have to look through an incredible number of false matches (aka false positives). Senzing reduces false positives, saving banks millions.
For the public sector, after Hurricane Katrina happened, 50 databases popped up with around a million and a half people registered as missing and found. But people would report the same person missing or found multiple times. Entity resolution helped figure out how many people were actually missing and where they were, and that helped reunite loved ones.
Similarly for information services, it is important to not associate the right derogatory data with the right person. Missing a match is a false negative. This is an entity resolution problem.
12. How do you hope your company and its technologies can improve?
Senzing is just trying to make it easier for people to access entity resolution software. Right now it is expensive or cumbersome and accessible mainly to the elite. We believe that entity resolution should be a mundane, fuzzy, data matching problem. We currently have the speed and accuracy, and so today we are most focused on making it easier to use and understandable.
13. Can you speak more about your role on the Board of the US Geospatial Intelligence Foundation? What has been the most meaningful initiative you have overseen or pursued?
As a member of the board, I have mostly worked on strategy. My favorite part of my role with the USGIF is our big conference called GEOINT, which brings geospatial people together to advance their tradecraft, see what new products are available, and network with other amazing geospatial professionals.
14. One of your interests is privacy, such as General Data Protection Regulation (GDPR) and the California CCPA law. Given your role on the Board of the Electronic Privacy Information Center (EPIC.org) as well, what work have you done in privacy, and how important are laws like GDPR & CCPA, in light of the recent rise of video communication services like Zoom?
Before GDPR, there was a law called the Fair Credit Reporting Act (FCRA). Since GDPR, we are going to see more states coming out with privacy laws. It would be ideal if the USG created a Data Protection Agency.
Because products are so easy to use today and people do not care much about privacy, they tend to overlook their own privacy interests. Now that people are working more than ever with these (often free) tools, it is more important than ever that these tools make privacy claims that they can keep. A Data Protection Agency would champion these often overlooked consumer rights.
15. On a similar note, what technologies in data science have been and can be developed to combat the spread of coronavirus, especially when the country begins to reopen? How will you weigh national security in stopping the virus with personal privacy?
I have been thinking a lot about this. A highly accurate coronavirus contact tracing system is in fact a surveillance tool. As such, great care must be taken in its design and use. I am optimistic about the partnership between Google and Apple where they are working on a joint contact tracing capability. We do have to figure out how to get people back to work because if unemployment continues to dramatically rise, the future will be pretty dark.
Written by Michael Ding & Edited by Alexander Fleiss