25 Feb

Transformers: What are they? Decepticons or GenAI’s little rascals?

Once upon a time, there were AIs and words.

Here, we are sailing in the deep waters of Natural Language Processing (NLP), the branch of Artificial Intelligence that teaches machines to speak our language. Today, the spotlight is on Generative AI and (LLMs) (Large Language Models), models that are part of the Transformer family. But to understand their reign, we must look at what they dethroned.

The predecessors of Transformers, Recurrent Neural Networks (RNNs), were slow and nearsighted readers. They deciphered a sentence word by word, step by step. The first word influenced the second, which influenced the third. By the time it reached the end of a long paragraph, the model suffered from contextual amnesia; it had forgotten the beginning. Information evaporated over the distance, making the generation of long texts incoherent, hazy, or even impossible.

2017: The paper that sent everything off the rails

Then came the year 2017 (the symbolic year of Transformers 5: The Last Knight). A team of Google researchers published a paper with a title worthy of a TV show punchline: “Attention Is All You Need”. https://arxiv.org/pdf/1706.03762

They introduced the Attention mechanism, breaking the chains of sequential reading. No more word-by-word deciphering. Now, the model embraces the entire sentence in a single glance, instantly. It weighs the importance of each word relative to all others, regardless of their distance. It weaves invisible links, semantic bridges between a subject and a verb even if they are three lines apart.

A quick example to understand how it works with words

Take the word “crane.” If it sits next to “construction,” its mathematical signature transforms. Perhaps this mathematical signature will lean toward something like a piece of heavy machinery on a building site 🪝? If it sits next to “bird,” it changes radically and becomes an elegant long-legged creature (🍓🤦).

The Transformer changes the nature of information: from isolated words, it gives birth to nuanced meaning.

Why the name, “Transformers”?

Because they… … …transform… an input sequence (for example, what you type into the interface of your favorite LLM) into a mathematical representation, the famous TOKENS! And they RE-transform these tokens into an output sequence to display the response or rendering, what they have “generated.” A true chaos of information for mere mortals, but formidably efficient.

The giants of transformation

This architecture has given rise to giants, each exploiting the power of the Transformer for a noble task:

BERT (The Scholar)

It uses the architecture to understand. By reading text in both directions (bidirectional), it grasps the most subtle nuances. It is the champion of classification and information retrieval. Its value? It understands the context hidden behind words.

GPT (The Poet)

It is a master of the art of generation. It is a Transformer that tirelessly predicts the next word, building worlds sentence after sentence. Its strength lies in its ability to maintain coherence over miles of text, where its ancestors would run out of breath after just a few yards.

ViT (Vision Transformer – The Eye)

It proves that the architecture goes beyond language. It slices an image into visual “words” (patches) and analyzes them with the same attention, sometimes surpassing classic convolutional networks.

Their added value is universal: parallelization. They process mountains of data simultaneously, allowing for learning at a titanic scale.

What the Transformer is NOT

However, the golden hammer shouldn’t see nails everywhere. The Transformer is a genius, particularly a literary one, but it is not an infallible mathematician, nor a rigorous accountant.

Tabular Uselessness: For structured, cold data, like Excel accounting tables, simpler models (like decision trees) remain kings. Using a Transformer here would be like using a rocket to cross the street: costly, energy-intensive, and inefficient.
Pure Logic: The Transformer does not “reason” like a logical algorithm; it mimics reasoning through probability. It can write a convincing but false mathematical proof. It is an artist of the plausible, not the guardian of absolute truth.
Frugality: They are energy monsters. For embedded applications on small connected objects requiring a microsecond response with little battery, the Transformer is often a giant too heavy to carry.

The Transformer has redrawn the AI horizon, but it remains a tool: sublime for language and complexity, superfluous for the simple and the structured. And above all, even if it allows us to “talk” naturally to machines, it must transform everything into its own language to get there.

We hope you’ll have a better understanding of how you use your tokens in the future!