Optimize open LLMs using GPTQ and Hugging Face Optimum


  • Hugging Face Optimum team collaborated with AutoGPTQ library for a simple API to apply GPTQ quantization on language models.
  • GPTQ quantization allows open LLMs to 8, 4, 3, or 2 bits, enabling them to run on smaller hardware with minimal performance loss.
  • The blog covers:
  1. Setting up the development environment.
  2. Preparing the quantization dataset.
  3. Loading and quantizing the model.
  4. Testing performance and inference speed.
  5. Bonus: Running inference with text generation.
  • GPTQ’s purpose is explained before diving into the tutorial.

PromptNER : Prompting For Named Entity Recognition

  • Large Language Models (LLMs) and prompt-based heuristics are being used for off-the-shelf solutions to various NLP problems.
  • LLM-based few-shot methods have shown promise but lag in Named Entity Recognition (NER) compared to other methods.
  • « PromptNER » is introduced as a new algorithm for few-shot and cross-domain NER.
  • PromptNER needs entity definitions and few-shot examples for a new NER task.
  • PromptNER uses LLM to generate potential entities and explanations for their compatibility with entity type definitions.
  • PromptNER achieves state-of-the-art performance in few-shot NER on ConLL, GENIA, and FewNERD datasets.
  • It also outperforms previous methods in Cross Domain NER, setting new records on 3 out of 5 CrossNER domains with an average F1 gain of 3%.



The complete guide to LLM fine-tuning

Pre-trained large language models (LLMs) offer impressive capabilities like text generation, summarization, and coding out of the box. However, they aren’t universally suitable for all tasks. Sometimes, your LLM might struggle with a specific task. In such cases, one option is to fine-tune the LLM, which involves retraining the base model on new data. Although fine-tuning can be complex, costly, and not the initial solution, it’s a potent technique that organizations using LLMs should consider. Understanding the mechanics of fine-tuning, even if you’re not an expert, can guide you in making informed decisions.


Publié dans LLM

Natural Language Understanding

A free Stanford course


Stanford School of Engineering

This project-oriented course focuses on building efficient and reliable models for understanding human language, drawing from linguistics, natural language processing, and machine learning. It covers tasks like contextual language representation, information retrieval, and NLU model evaluation. The course involves hands-on work to build baseline models and develop original models for class-wide competitions. The second half of the course is dedicated to an individual project in natural language understanding, following best practices in the field and incorporating topics like evaluations, semantic parsing, and grounded language understanding.


Publié dans LLM

Fine-Tuning Embedding for RAG with Synthetic Data

This repo shows you how to fine-tune an embedding model to improve RAG performance even if you don’t have labelled data (i.e. positive pairs of query/relevant documents).

We walkthrough step-by-step the process of generating a synthetic dataset with LLM, finetuning an opensource embedding model, and finally evaluating the finetuned model.

We experiment with a small scale dataset of financial PDF documents, and show that finetuning the embedding model can substantially improve retrieval performance.


Publié dans LLM

Optimizing LLM latency

  • Fastest Inference: mlc stands out as the fastest, prompting a need to assess its quality despite its impressive speed.
  • Favorite Tool: CTranslate2 is the preferred choice due to its speed and user-friendliness, supported by excellent documentation. It lacks distributed inference unlike vLLM.
  • vLLM Performance: vLLM is also fast but CTranslate outperforms it in speed. However, vLLM supports distributed inference, making it suitable for larger models.
  • Text Generation Inference (TGI): An acceptable choice for deploying HuggingFace LLMs traditionally, but not as swift as vLLM. Offers features like telemetry and HF ecosystem integration. Note that TGI’s licensing has become more restrictive as of 7/28/2023, potentially limiting certain commercial uses.
Publié dans LLM

The History of Open-Source LLMs: Better Base Models (Part Two)


  • Value of Open-source LLM Research: Aims to democratize influential technology; despite initial struggles and criticism, open-source LLMs gained popularity and significance.
  • Early Challenges: Initial open-source LLMs performed poorly and faced criticism, posing difficulties for advancement.
  • Transformative Research Line: Focuses on enhancing open-source LLMs, leading to high-performing pre-trained models accessible to all.
  • Significance of High-Performing Models: Creation of powerful, cost-effective pre-trained LLMs revolutionized research accessibility.
  • Series Overview: Part two of a three-part series on open-source LLM history. The first part explored initial open-source LLM attempts.
  • Study Focus: This overview delves into the most popular open-source base models, emphasizing pre-trained models not yet fine-tuned or aligned.
  • Future Exploration: Subsequent installment will discuss fine-tuning and alignment of models for diverse practical applications.

Advanced Prompt Engineering


The emergence of large language models (LLMs) has revolutionized problem-solving approaches. In the past, tasks like document reformatting or sentence classification necessitated creating specific computer programs. LLMs have transformed this process, enabling tasks to be accomplished through textual prompts. For instance, reformatting documents can be achieved by instructing an LLM. This shift was exemplified by GPT-3’s ability to achieve accurate results with minimal guidance.

As LLM research progressed, more sophisticated techniques emerged beyond basic prompting methods like zero/few-shot learning. Instruction-following LLMs (e.g., InstructGPT, ChatGPT) prompted investigations into tackling complex tasks. The goal was to extend LLMs beyond simple problems, requiring them to comprehend intricate instructions and execute multi-step reasoning. However, such challenges demand advanced prompting strategies due to their complexity.

Publié dans LLM

Practical Prompt Engineering


  • Prompt engineering: An empirical science focused on optimizing LLM (Large Language Model) performance through various prompting strategies.
  • Aims to understand prompting mechanics and employs techniques to enhance LLM capabilities.
  • Zero/few-shot learning: A fundamental technique where LLMs perform tasks with minimal or no training examples, showcasing their remarkable adaptability.
  • Instruction prompting: Another vital technique involving explicit instructions in prompts to guide LLM behavior.
  • Overview intends to impart practical insights and strategies for effective prompt engineering and LLM utilization.
  • Provides actionable tricks and takeaways for prompt engineers and LLM practitioners to enhance their effectiveness.