Renumics Spotlight

Publié le 17 février 2024 par loic

Spotlight helps you to understand unstructured datasets fast. You can create interactive visualizations from your dataframe with just a few lines of code. You can also leverage data enrichments (e.g. embeddings, prediction, uncertainties) to identify critical clusters in your data.

https://spotlight.renumics.com/

Revolutionizing AI Reading Comprehension: ReadAgent’s Breakthrough in Handling Documents with 20 Million Tokens

Publié le 17 février 2024 par loic

Introduction to ReadAgent by Google DeepMind
Development of ReadAgent, an AI capable of understanding long texts beyond the limits of its language model.
Utilizes a human-like reading strategy to comprehend complex documents.
Challenges Faced by Language Models
Context length limitation: Fixed token processing capacity leading to performance decline.
Ineffective context usage: Decreased comprehension with increasing text length.
Features of ReadAgent
Mimics human reading by forming and using « gist memories » of texts.
Breaks down texts into smaller « episodes » and generates gist memories for each.
Looks up relevant episodes when needed for answering questions.
Performance Enhancements
Capable of understanding documents « 20 times longer » than its base language model.
Shows improved performance on long document question answering datasets:
- QuALITY: Accuracy improved from 85.8% to 86.9%.
- NarrativeQA: Rating increased by 13-32% over baselines.
- QMSum: Rating improved from 44.96% to 49.58%.
Potential Applications
Legal contract review, scientific literature analysis, customer support, financial report summarization, automated online course creation.
Indicates the future potential of AI in mastering lengthy real-world documents through human-like reading strategies.

https://read-agent.github.io/

LORAX

Publié le 16 février 2024 par loic

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

LoRAX (LoRA eXchange) is a framework that allows users to serve thousands of fine-tuned models on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency.

https://github.com/predibase/lorax

Llm Visualization

Publié le 4 décembre 2023 par loic

A visualization and walkthrough of the LLM algorithm that backs OpenAI’s ChatGPT. Explore the algorithm down to every add & multiply, seeing the whole process in action.

Visualisation of LLM algorithm

Complete guide on llama

Publié le 28 octobre 2023 par loic

If you want to get started with Llama, this is the definitive place. Just some of the areas covered:

Fine Tuning
Quantization
Prompting
Inferencing
Validation
Integration Guides
Code Llama
Integration with LangChain
Integration with LlamaIndex

https://ai.meta.com/llama/get-started/

QA-LoRA: Fine-Tune a Quantized Large Language Model on Your GPU

Publié le 14 octobre 2023 par loic

State-of-the-art large language models (LLMs) are pre-trained with billions of parameters. While pre-trained LLMs can perform many tasks, they can become much better once fine-tuned.

Thanks to LoRA, fine-tuning costs can be dramatically reduced. LoRA adds low-rank tensors, i.e., a small number of parameters (millions), on top of the frozen original parameters. Only the parameters in the added tensors are trained during fine-tuning.

LoRA still requires the model to be loaded in memory. To reduce the memory cost and speed-up fine-tuning, a new approach proposes quantization-aware LoRA (QA-LoRA) fine-tuning.

In this article, I explain QA-LoRA and review its performance compared with previous work (especially QLoRA). I also show how to use QA-LoRA to fine-tune your own quantization-aware LoRA for Llama 2.

https://towardsdatascience.com/qa-lora-fine-tune-a-quantized-large-language-model-on-your-gpu-c7291866706c

Meta COT prompting

Publié le 14 octobre 2023 par loic

Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models

Meta-CoT is a generalizable CoT prompting method in mixed-task scenarios where the type of input questions is unknown. It consists of three phases: (i) scenario identification: categorizes the scenario of the input question; (ii) demonstration selection: fetches the ICL demonstrations for the categorized scenario; (iii) answer derivation: performs the answer inference by feeding the LLM with the prompt comprising the fetched ICL demonstrations and the input question

https://arxiv.org/abs/2310.06692

https://github.com/Anni-Zou/Meta-CoT

Mitigating LLM Hallucinations: a multifaceted approach

Publié le 2 octobre 2023 par loic

https://amatriain.net/blog/hallucinations#advancedprompting

Ever curious about the challenges of embedding large language models in products? A notable issue is ‘hallucinations’ where AI outputs misleading data. This blog offers a guide on tackling these issues in user-facing products, giving a snapshot of current best practices.

Graph-Based Prompting and Reasoning with Language Models

Publié le 2 septembre 2023 par loic

Advanced prompting techniques (e.g., chain of thought and tree of thought) improve the problem-solving capabilities of large language models (LLMs).
These techniques require LLMs to construct step-by-step responses.
They assume linear reasoning, which differs from human reasoning involving multiple chains of thought and insights combination.
This overview focuses on prompting techniques using a graph structure to capture non-linear problem-solving patterns.

Graph Prompts

The Novice’s LLM Training Guide

Publié le 1 septembre 2023 par loic

https://rentry.org/llm-training

A modern Large Language Model (LLM) is trained using the Transformers library, which leverages the power of the Transformer network architecture. This architecture has revolutionized the field of natural language processing and is widely adopted for training LLMs. Python, a high-level programming language, is commonly used for implementing LLMs, making them more accessible and easier to comprehend compared to lower-level frameworks such as OpenXLA’s IREE or GGML. The intuitive nature of Python allows researchers and developers to focus on the logic and algorithms of the model without getting caught up in intricate implementation details.

This rentry won’t go over pre-training LLMs (training from scratch), but rather fine-tuning and low-rank adaptation (LoRA) methods. Pre-training is prohibitively expensive, and if you have the compute for it, you’re likely smart enough not to need this rentry at all.

Deeplearning.fr

You have to learn the rules of the game. And then you have to play better than anyone else

Archives de catégorie : LLM