A visualization and walkthrough of the LLM algorithm that backs OpenAI’s ChatGPT. Explore the algorithm down to every add & multiply, seeing the whole process in action.
Archives de catégorie : LLM
Complete guide on llama
If you want to get started with Llama, this is the definitive place. Just some of the areas covered:
Fine Tuning
Quantization
Prompting
Inferencing
Validation
Integration Guides
Code Llama
Integration with LangChain
Integration with LlamaIndex
QA-LoRA: Fine-Tune a Quantized Large Language Model on Your GPU
State-of-the-art large language models (LLMs) are pre-trained with billions of parameters. While pre-trained LLMs can perform many tasks, they can become much better once fine-tuned.
Thanks to LoRA, fine-tuning costs can be dramatically reduced. LoRA adds low-rank tensors, i.e., a small number of parameters (millions), on top of the frozen original parameters. Only the parameters in the added tensors are trained during fine-tuning.
LoRA still requires the model to be loaded in memory. To reduce the memory cost and speed-up fine-tuning, a new approach proposes quantization-aware LoRA (QA-LoRA) fine-tuning.
In this article, I explain QA-LoRA and review its performance compared with previous work (especially QLoRA). I also show how to use QA-LoRA to fine-tune your own quantization-aware LoRA for Llama 2.
Meta COT prompting
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
Meta-CoT is a generalizable CoT prompting method in mixed-task scenarios where the type of input questions is unknown. It consists of three phases: (i) scenario identification: categorizes the scenario of the input question; (ii) demonstration selection: fetches the ICL demonstrations for the categorized scenario; (iii) answer derivation: performs the answer inference by feeding the LLM with the prompt comprising the fetched ICL demonstrations and the input question
Mitigating LLM Hallucinations: a multifaceted approach
https://amatriain.net/blog/hallucinations#advancedprompting
Ever curious about the challenges of embedding large language models in products? A notable issue is ‘hallucinations’ where AI outputs misleading data. This blog offers a guide on tackling these issues in user-facing products, giving a snapshot of current best practices.
Graph-Based Prompting and Reasoning with Language Models
- Advanced prompting techniques (e.g., chain of thought and tree of thought) improve the problem-solving capabilities of large language models (LLMs).
- These techniques require LLMs to construct step-by-step responses.
- They assume linear reasoning, which differs from human reasoning involving multiple chains of thought and insights combination.
- This overview focuses on prompting techniques using a graph structure to capture non-linear problem-solving patterns.
The Novice’s LLM Training Guide
https://rentry.org/llm-training
A modern Large Language Model (LLM) is trained using the Transformers library, which leverages the power of the Transformer network architecture. This architecture has revolutionized the field of natural language processing and is widely adopted for training LLMs. Python, a high-level programming language, is commonly used for implementing LLMs, making them more accessible and easier to comprehend compared to lower-level frameworks such as OpenXLA’s IREE or GGML. The intuitive nature of Python allows researchers and developers to focus on the logic and algorithms of the model without getting caught up in intricate implementation details.
This rentry won’t go over pre-training LLMs (training from scratch), but rather fine-tuning and low-rank adaptation (LoRA) methods. Pre-training is prohibitively expensive, and if you have the compute for it, you’re likely smart enough not to need this rentry at all.
The complete guide to LLM fine-tuning
Pre-trained large language models (LLMs) offer impressive capabilities like text generation, summarization, and coding out of the box. However, they aren’t universally suitable for all tasks. Sometimes, your LLM might struggle with a specific task. In such cases, one option is to fine-tune the LLM, which involves retraining the base model on new data. Although fine-tuning can be complex, costly, and not the initial solution, it’s a potent technique that organizations using LLMs should consider. Understanding the mechanics of fine-tuning, even if you’re not an expert, can guide you in making informed decisions.
Natural Language Understanding
A free Stanford course
XCS224U
Stanford School of Engineering
This project-oriented course focuses on building efficient and reliable models for understanding human language, drawing from linguistics, natural language processing, and machine learning. It covers tasks like contextual language representation, information retrieval, and NLU model evaluation. The course involves hands-on work to build baseline models and develop original models for class-wide competitions. The second half of the course is dedicated to an individual project in natural language understanding, following best practices in the field and incorporating topics like evaluations, semantic parsing, and grounded language understanding.
https://youtube.com/playlist?list=PLoROMvodv4rOwvldxftJTmoR3kRcWkJBp&si=XsWOdyJY7KhEhDJG
ELI5: FlashAttention
The goal of this blog post is to explain flash attention in such a way that hopefully anyone who already understands attention will ask themselves:
“Why didn’t I think of this before?” followed by “It’s so easy”.
https://gordicaleksa.medium.com/eli5-flash-attention-5c44017022ad