GPT in 60 Lines of NumPy | Deeplearning.fr

In this post, they implement a GPT from scratch in just 60 lines of numpy. We’ll then load the trained GPT-2 model weights released by OpenAI into our implementation and generate some text.

Note:

This post assumes familiarity with Python, NumPy, and some basic experience training neural networks.
This implementation is missing tons of features on purpose to keep it as simple as possible while remaining complete. The goal is to provide a simple yet complete technical introduction to the GPT as an educational tool.
The GPT architecture is just one small part of what makes LLMs what they are today.^[1].
All the code for this blog post can be found at github.com/jaymody/picoGPT.
Hacker news thread
Chinese translation