My Startup: Hazy

September 10, 2019

Hazy generates smart synthetic data that’s safe to use, allowing companies to innovate with data without using anything sensitive or real-life.

It originally span out of UCL just two years ago, but has come a long way since then. In 2018, Hazy won the $1 million Microsoft Innovate.AI prize for the best AI startup in Europe.

It helps organisations such as financial services providers balance data innovation with responsible data management, especially since the advent of GDPR and global headline-making breaches.

Founders: Harry Keen & James Arthur

Founded: 2017

Website: hazy.com

We talked to Harry (above, front row, second from the left) to see behind the scenes at Hazy.

Why did you start Hazy?

Hazy was founded in 2017 when we identified a growing trend of consumer awareness around data privacy, which peaked a year later following the Cambridge Analytica scandal, a series of high-profile data breaches, as well as the introduction of the EU-wide data regulation act, GDPR.

With Hazy, we aim to provide organisations with a way to continue developing innovative products and services using AI and machine learning, whilst treating customer data responsibly and adhering to data regulation. 

Tell us more about the tech behind the product?

Using AI tools, our platform uses customer data to generate smart “synthetic datasets” that look and act just like real data but contain zero personal or identifiable information. An easy way to visualise this is by looking at Nvidia’s This Person Does Not Exist. The webpage uses a machine learning technique called generative adversarial networks (GANs), which are trained on a dataset of real celebrity faces to produce hyper-realistic images of people who don’t exist.

This is, in essence, visual synthetic data.

When we apply GANs and similar technology to sensitive data (e.g. payments, transaction history, credit card details), organisations can then use new synthetic datasets for advanced machine learning without having to worry about data privacy.

Where are you at right now?

We raised $2.8 million in seed funding last year, which we used to develop and launch our current synthetic data solution. We are now working with a number of banks and financial institutions, including Nationwide.

What are your aims for the next year?

Over the next 12 months, we’ll be looking to onboard more customers in both the financial sector as well as the many other industries in which sensitive data should now be treated ethically and responsibly. We are also looking towards our next funding round, which should happen in the near future.

What’s been the hardest thing about getting Hazy off the ground?

One of the difficulties we faced early on was based around the data privacy technology itself.

Initially, we were looking to use a combination of machine learning-assisted masking techniques to anonymise datasets, but we also had concerns that individuals within these datasets could be re-identified through increasingly popular hacking practises, like linkage attacks.

With this in mind, we pivoted towards synthetic data generation. Technically, synthetic data is a subset of anonymisation – but unlike traditional anonymisation (which has come under fire recently), individuals cannot be re-identified in synthetic datasets.

Why should more people be using Hazy?

Over the past couple of years, there has been a seismic cultural shift around data. Consumer awareness of data privacy is at an all-time high; and we’ve already seen fines of more than $100 million for companies found in breach of GDPR.

Despite this, organisations shouldn’t be afraid of using their data for advanced machine learning. Instead of locking their customer data up and severely restricting who has access to it, they should generate synthetic datasets and work from these instead. This way, companies are free to innovate with zero privacy risk.

Why is Hazy worth the investment?

The cost of privacy-enhancing technology is always going to be significantly less than the financial repercussions of being found in breach of data regulation, even before you consider the damage that this will do to your brand’s reputation.

While it’s essential that companies are able to utilise their customer data, it is equally important that this data is handled responsibly.