London Futurists

AI Transformers in context, with Aleksa Gordić

London Futurists Season 1 Episode 5

Welcome to episode 5 of the London Futurist podcast, with your co-hosts David Wood and Calum Chace.

We’re attempting something rather ambitious in episodes 5 and 6. We try to explain how today’s cutting edge artificial intelligence systems work, using language familiar to lay people, rather than people with maths or computer science degrees.

Understanding how Transformers and Generative Adversarial Networks (GANs) work means getting to grips with concepts like matrix transformations, vectors, and landscapes with 500 dimensions.

This is challenging stuff, but do persevere. These AI systems are already having a profound impact, and that impact will only grow. Even at the level of pure self-interest, it is often said that in the short term, AIs won’t take all the jobs, but people who understand AI will take the best jobs.

We are extremely fortunate to have as our guide for these episodes a brilliant AI researcher at DeepMind, Aleksa Gordić.

Note that Aleksa is speaking in personal capacity and is not representing DeepMind.

Aleksa's YouTube channel is https://www.youtube.com/c/TheAIEpiphany

00.03 An ambitious couple of episodes
01.22 Introducing Aleksa, a double rising star
02.15 Keeping it simple
02.50 Aleksa's current research, and previous work on Microsoft's HoloLens
03.40 Self-taught in AI. Not representing DeepMind
04.20 The narrative of the Big Bang in 2012, when Machine Learning started to work in AI.
05.15 What machine learning is
05.45 AlexNet. Bigger data sets and more powerful computers
06.40 Deep learning a subset of machine learning, and a re-branding of artificial neural networks
07.27 2017 and the arrival of Transformers
07.40 Attention is All You Need
08.16 Before this there were LSTMs, Long Short-Term Memories
08.40 Why Transformers beat LSTMs
09.58 Tokenisation. Splitting text into smaller units and mapping them onto higher dimension networks
10.30 3D space is defined by three numbers
10.55 Humans cannot envisage multi-dimensional spaces with hundreds of dimensions, but it's OK to imagine them as 3D spaces
11.55 Some dimensions of the word "princess"
12.30 Black boxes
13.05 People are trying to understand how machines handle the dimensions
13.50 "Man is to king as woman is to queen." Using mathematical operators on this kind of relationship
14.35 Not everything is explainable
14.45 Machines discover the relationships themselves
15.15 Supervised and self-supervised learning. Rewarding or penalising the machine for predicting labels
16.25 Vectors are best viewed as arrows in 3D space, although that is over-simplifying
17.20 For instance the relationship between "queen" and "woman" is a vector
17.50 Self-supervised systems do their own labelling
18.30 The labels and relationships have probability distributions
19.20 For instance, a princess is far more likely to wear a slipper than a dog
19.35 Large numbers of parameters
19.40 BERT, the original Transformer, had a hundred million or so parameters
20.04 Now it's in the hundreds of billions, or even trillions
20.24 A parameter is analogous to a synapse in the human brain
21.19 Synapses can have different weights
22.10 The more parameters, the lower the loss
22.35 Not just text, but images too, because images can also be represented as tokens
23.00 In late 2020 Google released the first vision Transformer
23.29 Dall-E and Midjourney are diffusion models, which have replaced GANs
24.15 What are GANs, or Generative Adversarial Networks?
24.45 Two types of model: Generators and Discriminators. The first tries to fool the second
26.20 Simple text can produce photorealistic images
27.10 Aleksa's YouTube videos are available at "The AI Epiphany"
27.40 Close

Music: Spike Protein, by Koi Discovery, available under CC0 1.0 Public Domain Declaration

People on this episode