Technology

#Understanding Transformers, the machine learning model behind GPT-3

#Understanding Transformers, the machine learning model behind GPT-3

You know that expression When you have a hammer, everything looks like a nail? Well, in machine learning, it seems like we really have discovered a magical hammer for which everything is, in fact, a nail, and they’re called Transformers. Transformers are models that can be designed to translate text, write poems and op eds, and even generate computer code. In fact, lots of the amazing research I write about on daleonai.com is built on Transformers, like AlphaFold 2, the model that predicts the structures of proteins from their genetic sequences, as well as powerful natural language processing (NLP) models like GPT-3, BERT, T5, Switch, Meena, and others. You might say they’re more than meets the… ugh, forget it.

If you want to stay hip in machine learning and especially NLP, you have to know at least a bit about Transformers. So in this post, we’ll talk about what they are, how they work, and why they’ve been so impactful.

A Transformer is a type of neural network architecture. To recap, neural nets are a very effective type of model for analyzing complex data types like images, videos, audio, and text. But there are different types of neural networks optimized for different types of data. For example, for analyzing images, we’ll typically use convolutional neural networks or “CNNs.” Vaguely, they mimic the way the human brain processes visual information.

A typical Convolutional Neural Network
Credit: Renanar2 / Wikicommons