The complete guide to LLM fine-tuning

Publié le 26 août 2023 par loic

Pre-trained large language models (LLMs) offer impressive capabilities like text generation, summarization, and coding out of the box. However, they aren’t universally suitable for all tasks. Sometimes, your LLM might struggle with a specific task. In such cases, one option is to fine-tune the LLM, which involves retraining the base model on new data. Although fine-tuning can be complex, costly, and not the initial solution, it’s a potent technique that organizations using LLMs should consider. Understanding the mechanics of fine-tuning, even if you’re not an expert, can guide you in making informed decisions.

https://bdtechtalks.com/2023/07/10/llm-fine-tuning/amp/

Natural Language Understanding

Publié le 26 août 2023 par loic

A free Stanford course

XCS224U

Stanford School of Engineering

This project-oriented course focuses on building efficient and reliable models for understanding human language, drawing from linguistics, natural language processing, and machine learning. It covers tasks like contextual language representation, information retrieval, and NLU model evaluation. The course involves hands-on work to build baseline models and develop original models for class-wide competitions. The second half of the course is dedicated to an individual project in natural language understanding, following best practices in the field and incorporating topics like evaluations, semantic parsing, and grounded language understanding.

https://youtube.com/playlist?list=PLoROMvodv4rOwvldxftJTmoR3kRcWkJBp&si=XsWOdyJY7KhEhDJG

ELI5: FlashAttention

Publié le 26 août 2023 par loic

The goal of this blog post is to explain flash attention in such a way that hopefully anyone who already understands attention will ask themselves:

“Why didn’t I think of this before?” followed by “It’s so easy”.

https://gordicaleksa.medium.com/eli5-flash-attention-5c44017022ad

Fine-Tuning Embedding for RAG with Synthetic Data

Publié le 26 août 2023 par loic

This repo shows you how to fine-tune an embedding model to improve RAG performance even if you don’t have labelled data (i.e. positive pairs of query/relevant documents).

We walkthrough step-by-step the process of generating a synthetic dataset with LLM, finetuning an opensource embedding model, and finally evaluating the finetuned model.

We experiment with a small scale dataset of financial PDF documents, and show that finetuning the embedding model can substantially improve retrieval performance.

https://github.com/run-llama/finetune-embedding

Optimizing LLM latency

Publié le 2 août 2023 par loic

Fastest Inference: mlc stands out as the fastest, prompting a need to assess its quality despite its impressive speed.
Favorite Tool: CTranslate2 is the preferred choice due to its speed and user-friendliness, supported by excellent documentation. It lacks distributed inference unlike vLLM.
vLLM Performance: vLLM is also fast but CTranslate outperforms it in speed. However, vLLM supports distributed inference, making it suitable for larger models.
Text Generation Inference (TGI): An acceptable choice for deploying HuggingFace LLMs traditionally, but not as swift as vLLM. Offers features like telemetry and HF ecosystem integration. Note that TGI’s licensing has become more restrictive as of 7/28/2023, potentially limiting certain commercial uses.

Advanced Prompt Engineering

Publié le 2 août 2023 par loic

https://cameronrwolfe.substack.com/p/advanced-prompt-engineering

The emergence of large language models (LLMs) has revolutionized problem-solving approaches. In the past, tasks like document reformatting or sentence classification necessitated creating specific computer programs. LLMs have transformed this process, enabling tasks to be accomplished through textual prompts. For instance, reformatting documents can be achieved by instructing an LLM. This shift was exemplified by GPT-3’s ability to achieve accurate results with minimal guidance.

As LLM research progressed, more sophisticated techniques emerged beyond basic prompting methods like zero/few-shot learning. Instruction-following LLMs (e.g., InstructGPT, ChatGPT) prompted investigations into tackling complex tasks. The goal was to extend LLMs beyond simple problems, requiring them to comprehend intricate instructions and execute multi-step reasoning. However, such challenges demand advanced prompting strategies due to their complexity.

Large Transformer Model Inference Optimization

Publié le 2 août 2023 par loic

https://lilianweng.github.io/posts/2023-01-10-inference-optimization/#quantization

Large transformer models are mainstream nowadays, creating SoTA results for a variety of tasks. They are powerful but very expensive to train and use. The extremely high inference cost, in both time and memory, is a big bottleneck for adopting a powerful transformer for solving real-world tasks at scale.

Why is it hard to run inference for large transformer models? Besides the increasing size of SoTA models, there are two main factors contributing to the inference challenge (Pope et al. 2022):

Large memory footprint. Both model parameters and intermediate states are needed in memory at inference time. For example,
- The KV cache should be stored in memory during decoding time; E.g. For a batch size of 512 and context length of 2048, the KV cache totals 3TB, that is 3x the model size (!).
- Inference cost from the attention mechanism scales quadratically with input sequence length.
Low parallelizability. Inference generation is executed in an autoregressive fashion, making the decoding process hard to parallel.

In this post, we will look into several approaches for making transformer inference more efficient. Some are general network compression methods, while others are specific to transformer architecture.

Universal and Transferable Adversarial Attacks on Aligned Language Models

Publié le 2 août 2023 par loic

https://llm-attacks.org/

This research examines the safety of large language models (LLMs) such as ChatGPT, Bard, and Claude. It demonstrates the potential for automated creation of adversarial attacks, using character sequences added to user queries that manipulate the LLM into following harmful commands. Unlike traditional « jailbreaks, » these attacks are automated and can affect both open-source and closed-source chatbots. The study raises concerns about the effectiveness of mitigation measures and suggests that the challenges posed by adversarial behavior might persist due to the nature of deep learning models. The findings highlight the need for careful consideration of the safety implications as LLMs become more integrated into various applications.

L	M	M	J	V	S	D
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Deeplearning.fr

You have to learn the rules of the game. And then you have to play better than anyone else

Archives de catégorie : LLM

The complete guide to LLM fine-tuning

Natural Language Understanding

ELI5: FlashAttention

Fine-Tuning Embedding for RAG with Synthetic Data

Optimizing LLM latency

Advanced Prompt Engineering

Large Transformer Model Inference Optimization

Universal and Transferable Adversarial Attacks on Aligned Language Models