What are Autoencoders, and how can I use them?

What are Autoencoders, and how can I use them?
Photo by Tara Winstead: https://www.pexels.com/photo/robot-pointing-on-a-wall-8386440/

My first reaction to learning about autoencoders was, that's so cool.

In this article, I want to describe from a high(ish) level what auto-encoders are, how they work, and how you can use them.

Hopefully, you'll feel the same way I do by the end.

The basic idea behind an autoencoder

Imagine an auto-encoder as a box.

You'll feed that box some data (an image, some words etc.), and that box will return the same data.

The basic setup of an auto-encoder

So far, this is not that impressive. Why would I want a box that gave me back what I gave it?

Well, inside that box is an autoencoder doing something very interesting.

Inside the auto-encoder, there is a model with a bottleneck. This bottleneck forces the model to learn a lower-dimensional representation of the original data.

The two key concepts in the sentence above are the bottleneck and the lower-dimensional representation.

The bottleneck is the architecture of the model. Imagine you summarised the novel War and Peace by Fyodor Dostoyevsky. To create this summary, you would use fewer words than the original. The fewer words rule would be the bottleneck. It forces you to transform the work in some way.

But what are you doing by summarising? You create a lower-dimensional representation of the work. You might say you are capturing the essence or describing the themes. Whatever you want to call it, you somehow represent the original work in a lower dimension or a condensed form.

Ok, and what do we do once we've summarised the text?

We take the summarised version and reconstruct the original.

Pictorially, it looks like this:

Picture of an autoencoder

Here I've added the labels encoder and decoder. These represent the two parts of an auto-encoder.

The encoder is the part that summarises (or compresses).

The latent space is the lower-dimensional representation.

The decoder is the part that reconstructs the original (or decompresses).

Wait a second, isn't this a bit like principal component analysis? Doesn't that 'reduce the dimensionality of the original data?

Well spotted.

Yes, it does. But principle component analysis only captures linear relationships in the data, whereas auto-encoder can also capture non-linear relationships.

Ok, I understand how an autoencoder works, and I admit, it's pretty cool. But why would you compress and then decompress data?!

What can you do with Autoencoders?

Now things get interesting.

Firstly, it's essential to realise that you don't use autoencoders to compress and decompress the same data. That would be pointless.

However, we can use the three components of an autoencoder–the encoder, latent space, and decoder–to do some pretty exciting things. For example, we can use auto-encoders to:

  1. De-noise
  2. Generate

De-noising

De-noising just means removing the noise from something.

Imagine you have a crackly old record and would like to remove the crackles. Or imagine you want to update old films (like this: https://www.youtube.com/watch?v=6FN06Hf1iFk). These are both examples of de-noising.

To de-noise, you must train a variant of the auto-encoder described above. It looks like this (assuming we're trying to de-noise an old film):

Auto-encoder for de-noising

In this variant, you take a high-quality film and add noise to create a noisy film.

You then take the noisy film as the input to your model and train it to output the original film.

What happens is that the encoder learns to extract the important features of the film, and the decoder learns to take those features and reconstruct the original film.

When you've trained the model on lots of noisy inputs, it learns what we want to see: the original film cleaned up.

To use this model to clean up an old film, you simply pass the old film in, and the model will compress it into its latent space and then decompress it to a clean version.

The output is a clean version of the original film.

Generate

Another way to use auto-encoders is to generate new data.

For example, suppose you took a photo of yourself and wanted to see how that photo would look if you were smiling. In this example, you would be generating a new, realistic image.

But how do we do this?

Imagine we take a photo and map it onto a point in a two-dimensional grid. We both know we can't capture the variation in a photo with a two-dimensional vector, but let's imagine that we can for illustration.

Suppose further that one component of the vector is 'what your face looks like', and the other is 'how much you are smiling' (I repeat, this is not realistic, but is fine for illustration).

We train this auto-encoder on faces with different levels of smiling. So the encoder will map it onto our two-dimensional array (what the face looks like and how much the face is smiling). And the decoder will map the face back.

Autoencoder of smiling faces

Now imagine that we feed the encoder an original image of you not smiling.

The encoder will represent that in our two-dimensional space.

Before we decode this image, imagine now that we alter the latent space by moving the point along the axis that encodes the smiling.

Smiling component moved

When we decode the image, it will be a picture of you, only smiling.

Conclusion

If you aren't impressed by now, maybe you'll never be.

Of course, there's more to autoencoders than described here. But, as promised, I've kept things at a highish level. High enough to sound clever at a party but probably not high enough to build an autoencoder and do something useful with it.

If you want to learn more, here's a useful resource.