TLDR; Train an "energy" model that checks if the output is correct (rather than just outputting something), and gradient descent to find good outputs. Using transformers.
I've seen some of that channel's videos before, and many of them contain errors. I haven't read the Energy-Based Transformers paper yet, so I can't say for sure if this video contains any errors, but be careful.
I would read the blog post by the lead author instead of watching this video:
https://alexiglad.github.io/blog/2025/ebt/
Also, see:
https://www.reddit.com/r/MachineLearning/comments/1lu1ia0/r_...
TLDR; Train an "energy" model that checks if the output is correct (rather than just outputting something), and gradient descent to find good outputs. Using transformers.
I've seen some of that channel's videos before, and many of them contain errors. I haven't read the Energy-Based Transformers paper yet, so I can't say for sure if this video contains any errors, but be careful.