RLHF: Reinforcement Learning from Human Feedback

In literature discussing why ChatGPT is able to capture so much of our imagination, I often come across two narratives:

  1. Scale: throwing more data and compute at it.
  2. UX: moving from a prompt interface to a more natural chat interface.

One narrative that is often glossed over is the incredible technical creativity that went into making models like ChatGPT work. One such cool idea is RLHF (Reinforcement Learning from Human Feedback): incorporating reinforcement learning and human feedback into NLP.

RL has been notoriously difficult to work with, and therefore, mostly confined to gaming and simulated environments like Atari or MuJoCo. Just five years ago, both RL and NLP were progressing pretty much orthogonally – different stacks, different techniques, and different experimentation setups. It’s impressive to see it work in a new domain at a massive scale.

So, how exactly does RLHF work? Why does it work? This post will discuss the answers to those questions.


« Advanced Prompting Techniques for Enhancing Large Language Model Performance: A Comprehensive Guide »

Here’s the revised list including the Proactive Chain-of-Thought Prompting (ProCoT) from recent research:

  1. Zero-Shot Prompting:
  • Description: The model is given a task without any prior examples.
  • Goal: To provide a concise explanation or answer based on general knowledge.
  • Example 1: « Describe the process of photosynthesis. »
  • Example 2: « Explain the significance of the Treaty of Versailles. »
  1. Few-Shot Prompting:
  • Description: The model is provided with a few examples to better understand the task.
  • Goal: To generate responses that follow the pattern shown in the examples.
  • Example 1: « Text: Neil Armstrong landed on the moon in 1969. Event: Moon Landing, 1969. Text: The first iPhone was released in 2007. Event: »
  • Example 2: « Problem: If I have 5 apples and give away 2, how many do I have left? Solution: 3 apples. Problem: If a train travels 100 miles in 1 hour, how far will it travel in 3 hours? Solution: »
  1. Chain of Thought (CoT) Prompting:
  • Description: Guides the model to decompose the problem into intermediate steps before providing the final answer.
  • Goal: To facilitate complex problem-solving by breaking down the process.
  • Example 1: « To solve the math problem ‘8 divided by 2(2+2)’, let’s think step by step. »
  • Example 2: « To determine the capital of France, let’s consider the major cities in France and identify the one that is the political and cultural center. »
  1. Proactive Chain-of-Thought Prompting (ProCoT):
  • Description: Involves planning and taking initiative towards a conversational goal, enhancing proactivity in dialogue systems.
  • Goal: To develop a proactive and strategic response to a situation or problem.
  • Example 1: « A customer complains about a late delivery. Let’s plan the steps to address this issue. »
  • Example 2: « To decide on a marketing strategy for a new product, let’s first analyze the target market and then determine the most effective approach. »
  1. Contrastive Chain of Thoughts:
  • Description: Uses contrasting explanations (correct and incorrect) to enhance understanding.
  • Goal: To clarify a concept by differentiating correct information from misconceptions.
  • Example 1: « Correct: Plants release oxygen during photosynthesis. Incorrect: Plants consume oxygen during photosynthesis. Now, explain photosynthesis. »
  • Example 2: « Correct: The Earth orbits the Sun. Incorrect: The Sun orbits the Earth. Describe the solar system’s structure. »
  1. Self-Reflection Prompting:
  • Description: Adds a verification layer to the generated response to detect errors or inconsistencies.
  • Goal: To ensure accuracy and completeness in the response.
  • Example 1: « After summarizing the article, review the summary for accuracy and completeness. »
  • Example 2: « Translate this text and then evaluate the translation for any possible errors or improvements. »
  1. Decomposed Prompting:
  • Description: Breaks down a complex prompt into sub-prompts, each addressing a part of the overall task.
  • Goal: To tackle complex, multi-faceted tasks by addressing each aspect separately.
  • Example 1: « Break down the process of cellular respiration into its major stages and then explain each stage. »
  • Example 2: « Divide the history of the Roman Empire into key periods and summarize the main events of each period. »
  1. Self Consistency Prompting:
  • Description: Generates multiple responses to the same prompt and selects the most consistent or frequent answer.
  • Goal: To achieve a more reliable and consistent answer by considering multiple possibilities.
  • Example 1: « Generate multiple explanations for why the sky is blue, and then identify the most accurate explanation. »
  • Example 2: « Provide several predictions for the future of renewable energy, and then choose the most likely scenario. »
  1. System 2 Attention Prompting:
  • Description: Focuses on extracting relevant information from a text while ignoring biases and irrelevant details.
  • Goal: To respond to queries based solely on pertinent and factual information.
  • Example 1: « Extract factual information from this political speech, then analyze the policy proposals based on these facts. »
  • Example 2: « Identify key scientific facts in this article about global warming, then summarize the implications of these facts. »
  1. Simulation Theory of Mind Prompting:
    • Description: Involves taking the perspective of a specific persona or knowledge base to answer questions.
    • Goal: To provide insights or answers from a unique or specific perspective.
    • Example 1: « Imagine you’re an AI from the future. What information do you have about advanced technology? Based on this, predict future tech trends. »
    • Example 2: « Assume you are a historian from the 1800s. What is your understanding of industrialization? Explain its impact based on that perspective. »
    • These methods showcase the versatility of LLMs in handling various tasks and queries, each technique offering unique advantages for specific types of problems.

Merging all the prompting methods into a single approach is complex but conceivable, especially for multifaceted or highly nuanced tasks. The goal would be to leverage the strengths of each method for different aspects of the task. Here are 10 samples illustrating how multiple methods can be integrated:

  1. Integrated Sample 1:
  • Prompt: « Imagine you’re an AI analyzing climate data. First, identify key patterns in this climate report (System 2 Attention). Then, using a Chain of Thought approach, evaluate how these patterns indicate global warming trends. Finally, compare these findings with historical climate data (Contrastive Chain of Thoughts) and provide a summarized prognosis (Decomposed Prompting). »
  1. Integrated Sample 2:
  • Prompt: « As a historian from the 1800s (Simulation Theory of Mind), review this modern article on the Industrial Revolution. Extract factual data (System 2 Attention), then critically analyze the differences in perspectives (Contrastive Chain of Thoughts). Conclude with a few-shot prompted summary comparing past and present views. »
  1. Integrated Sample 3:
  • Prompt: « To solve this complex math problem, let’s break it down (CoT). Consider alternative methods to solve each step (Self Consistency). Reflect on each solution’s validity (Self-Reflection). Finally, provide a concise explanation of the solution process (Decomposed Prompting). »
  1. Integrated Sample 4:
  • Prompt: « First, read these contrasting opinions on renewable energy (Contrastive Chain of Thoughts). Analyze their factual accuracy (System 2 Attention) and reasoning (CoT). Then, synthesize a proactive plan to increase renewable energy adoption, considering economic and environmental factors (ProCoT). »
  1. Integrated Sample 5:
  • Prompt: « Imagine you’re a 22nd-century environmentalist (Simulation Theory of Mind). Review these historical documents on deforestation (System 2 Attention), identify key changes over time (Contrastive Chain of Thoughts), and predict future trends (Decomposed Prompting). Summarize your findings with potential solutions (Few-Shot Prompting). »
  1. Integrated Sample 6:
  • Prompt: « As a medical AI, analyze these patient reports (System 2 Attention), identify symptoms (CoT), and diagnose (Decomposed Prompting). Compare your diagnosis with similar historical cases (Contrastive Chain of Thoughts) and suggest a treatment plan (ProCoT). »
  1. Integrated Sample 7:
  • Prompt: « Read this debate on AI ethics (System 2 Attention). Identify the main arguments (CoT), compare them with established ethical standards (Contrastive Chain of Thoughts), and propose a balanced ethical guideline for AI development (ProCoT). »
  1. Integrated Sample 8:
  • Prompt: « Review this economic report as a 20th-century economist (Simulation Theory of Mind), extract key economic indicators (System 2 Attention), compare with current economic data (Contrastive Chain of Thoughts), and predict future economic trends (Decomposed Prompting). »
  1. Integrated Sample 9:
  • Prompt: « Analyze this new technology from a future perspective (Simulation Theory of Mind). Break down its potential impacts (CoT), compare with past technological advancements (Contrastive Chain of Thoughts), and propose future applications (ProCoT). »
  1. Integrated Sample 10:
    • Prompt: « Examine this legal case file (System 2 Attention). Identify key legal precedents (CoT), compare with similar cases (Contrastive Chain of Thoughts), and predict the outcome (Self Consistency). Finally, draft a judgment summary (Decomposed Prompting). »

In each of these samples, multiple prompting methods are combined to tackle different parts of the task, creating a comprehensive approach that leverages the strengths of each technique. This integrated approach can be particularly useful for complex tasks requiring nuanced analysis, synthesis of different viewpoints, and strategic planning.

Mastering Time Series Forecasting: A Guide to Python’s Most Influential Libraries

The Python ecosystem offers a rich suite of libraries for time series forecasting. Each caters to different needs and comes with its community and popularity, often reflected in the number of GitHub stars. Here’s a rundown of the top libraries, their best use cases, and resources for learning more:

  1. Prophet (Facebook):
  1. pmdarima:
  1. Skforecast:
  1. Greykite (LinkedIn):
  1. Functime:
  1. Arch:

Nixtla’s Suite:

  • StatsForecast:
  • Best for: Rapid computations and high-performance univariate time series forecasting.
  • GitHub Stars: Check Latest
  • Best Article: Nixtla Official Page
  • mlforecast:
  • Best for: Distributed computing environments needing feature engineering at scale.
  • GitHub Stars: Check Latest
  • NeuralForecast:
  • Best for: Leveraging neural networks for time series forecasting, suitable for non-experts.
  • GitHub Stars: Check Latest

Transformers for Time Series:

This curated guide aims to illuminate the path for those exploring the varied landscape of time series forecasting, providing a compass to the tools that resonate most with your project.

NoGAN: Ultrafast Data Synthesizer

  • Introduces NoGAN, a faster alternative to traditional tabular data synthetization.
  • Runs 1000x quicker than GAN, delivering superior results with a new, sophisticated evaluation metric.
  • A significant cost reducer, minimizing cloud/GPU time and training time.
  • Replaces manual fine-tuning parameters with auto-tuning.
  • Now available as open-source software.
  • Real-life case studies: synthetization in <5 seconds (compared to 10 minutes with GAN).
  • Produces higher quality results, confirmed via cross-validation.
  • Fast implementation enables automatic, efficient hyperparameter fine-tuning.
  • Future improvements discussed: speed enhancement, data faithfulness, auto-tuning, Gaussian NoGAN, and broader applications.


Visually Understanding UMAP

In this article, they explore dimensionality reduction, a valuable tool for machine learning practitioners aiming to analyze vast, high-dimensional datasets. While t-SNE is a commonly used technique for visualization, its efficacy diminishes with large datasets and mastering its application can be challenging.

UMAP, introduced by McInnes et al., presents several advantages over t-SNE, including enhanced speed and better preservation of a dataset’s global structure. This article delves into the theory behind UMAP, providing insights into its functionality, effective usage, and a performance comparison with t-SNE.


Challenges of NLP in Dealing with Structured Documents: The Case of PDFs


  • NLP’s expanding real-world applications face a hurdle.
  • Most NLP tasks assume clean, raw text data.
  • In practice, many documents, especially legal ones, are visually structured, like PDFs.
  • Visual Structured Documents (VSDs) pose challenges for content extraction.
  • The discussion primarily focuses on text-only layered PDFs.
  • These PDFs, although considered resolved, still present NLP challenges.


RAG: Multi-Document Agents

  • Multi-Document Agents guide explains how to set up an agent that can answer different types of questions over a larger set of documents.
  • The questions include QA over a specific doc, QA comparing different docs, summaries over a specific doc, and comparing summaries between different docs.
  • The architecture involves setting up a « document agent » over each document, which can do QA/summarization within its document, and a top-level agent over this set of document agents, which can do tool retrieval and then do CoT over the set of tools to answer a question.
  • The guide provides code examples using the LlamaIndex and OpenAI libraries.
  • The document agent can dynamically choose to perform semantic search or summarization within a given document.
  • A separate document agent is created for each city.
  • The top-level agent can orchestrate across the different document agents to answer any user query.


Publié dans RAG | Marqué avec

QA-LoRA: Fine-Tune a Quantized Large Language Model on Your GPU

State-of-the-art large language models (LLMs) are pre-trained with billions of parameters. While pre-trained LLMs can perform many tasks, they can become much better once fine-tuned.

Thanks to LoRA, fine-tuning costs can be dramatically reduced. LoRA adds low-rank tensors, i.e., a small number of parameters (millions), on top of the frozen original parameters. Only the parameters in the added tensors are trained during fine-tuning.

LoRA still requires the model to be loaded in memory. To reduce the memory cost and speed-up fine-tuning, a new approach proposes quantization-aware LoRA (QA-LoRA) fine-tuning.

In this article, I explain QA-LoRA and review its performance compared with previous work (especially QLoRA). I also show how to use QA-LoRA to fine-tune your own quantization-aware LoRA for Llama 2.


Meta COT prompting

Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models

Meta-CoT is a generalizable CoT prompting method in mixed-task scenarios where the type of input questions is unknown. It consists of three phases: (i) scenario identification: categorizes the scenario of the input question; (ii) demonstration selection: fetches the ICL demonstrations for the categorized scenario; (iii) answer derivation: performs the answer inference by feeding the LLM with the prompt comprising the fetched ICL demonstrations and the input question