Deep Learning can tackle the problem of sparse data by augmenting available data

Deep Learning can tackle the problem of sparse data by augmenting available data

TL;DR: Data is sparse but techniques like Generative Adversarial Networks (GANs) can imitate this limited data and create variations of it to train neural networks. The result? A compendium of applications in local and state security, weather forecasting, and a variety of industries.

Deep learning is considered the coolest thing around. Many liken it to a magic wand whose each swish can solve pressing problems. But the reality is more grounded.

Deep learning, a subset of machine learning, is actually like a hungry lion that needs to be constantly fed with food or data. But it’s also a petulant, choosy lion. It wants the data to be labelled and in particular formats for the right training.

As a result of this need, Deep Learning today is going through a challenge of sparse data — a situation where you have great rockets, but just not enough fuel or the right fuel. Many problems that Deep Learning is trying to solve today — from image, video, audio recognition or classification — is a sparse data problem. Either the data does not exist or is not pre-processed in the right manner. This creates an imminent bottleneck for the AI growth story.

But instead of trying to acquire more and more data, we’re now finding alternate solutions. One way is to meet this voracious data demand artificially. Generative Adversarial Networks (GANs), as we previously discussed, can be used to generate new data by learning from existing data.

Imagine a criminal is on the loose. We have one image of this ruthless terrorist, but that’s it. The data, in deep learning language, is sparse for training purposes. What GANs can do here is come up with variations, of say, different angles of his face and be trained to detect that face no matter how it changes due to age or other factors. This has far-reaching effects, especially in automated recognition through surveillance cameras.

Security is just one of the many areas where we can now utilize previously unusable sparse data. Satellite imagery of say, climate conditions, is another. Previously, even if you had a satellite image, it would often be of a low resolution where the important features of the image would appear as blurred points. We can now augment this sparse data using GANs and other sparse data enhancement techniques so they can be useful for training AI models on.

Unpacking GANs

To better understand how GANs work, let’s think of the same application in satellite imagery.

GANs have a generative part called the generator and an adversarial part called the discriminator. Think of the discriminator and generator as a really strict teacher and a sincere student.

The student (or the generator) is trying to imitate existing satellite imagery by coming up with newer options. It goes to the teacher (or the discriminator) who will analyse whether it works or not. At first, the teacher will say no, it’s nothing like the existing image. Re-do. The student will return and try to fix the errors. The teacher will again say, no. Not yet. And the iterations will continue until the student acquires expertise and the teacher is satisfied.

Once this is done, the student will start to label images and convert them into a relevant dataset to train the Neural Network so when a new image comes, its features can be automatically identified and labelled.

Images are merely one form of sparse data. Industries like steel or pharmaceutical have numerical and other forms of data for each stage of production: from raw material, to particle sizes, the pressure imposed on semi-finished materials, the temperature on each stage, so on and so forth. This data, however, is often too insufficient to make sound and purposeful judgments. Previously, we would wait till an entire production cycle had passed to acquire and then make sense of the data but now, sparse data enhancement can speed things up.

The AI-driven generative model takes existing data, extracts patterns of it, and makes new data. It learns patterns from the sparse data sets, and bridges a wide gap by creating data as though it belongs to that very data set.

Gopichand Katragadda, Founder and CEO of Myelin Foundry, explains, “GANs are difficult to train but they hold the promise of generating industry specific simulated data for events which are episodic — for example, identifying tumors in medical images such as CT and MR scans.”

Thus, what used to be difficult-to-use sparse data can now be expanded and used to train neural networks. Simply put, it may not be a magic wand but it is transforming how data is processed, understood and utilized, to benefit a variety of industries.

1
  1. I hope that you won’t stop writing such interesting articles. I’m waiting for more of your content. I’m going to follow you!

Post a comment

Top