In this post, they implement a GPT from scratch in just 60 lines of numpy
. We’ll then load the trained GPT-2 model weights released by OpenAI into our implementation and generate some text.
Note:
- This post assumes familiarity with Python, NumPy, and some basic experience training neural networks.
- This implementation is missing tons of features on purpose to keep it as simple as possible while remaining complete. The goal is to provide a simple yet complete technical introduction to the GPT as an educational tool.
- The GPT architecture is just one small part of what makes LLMs what they are today.[1].
- All the code for this blog post can be found at github.com/jaymody/picoGPT.
- Hacker news thread
- Chinese translation