Fine-Tuning Embedding for RAG with Synthetic Data

Publié le 26 août 2023 par loic

This repo shows you how to fine-tune an embedding model to improve RAG performance even if you don’t have labelled data (i.e. positive pairs of query/relevant documents).

We walkthrough step-by-step the process of generating a synthetic dataset with LLM, finetuning an opensource embedding model, and finally evaluating the finetuned model.

We experiment with a small scale dataset of financial PDF documents, and show that finetuning the embedding model can substantially improve retrieval performance.

https://github.com/run-llama/finetune-embedding

Optimizing LLM latency

Publié le 2 août 2023 par loic

Fastest Inference: mlc stands out as the fastest, prompting a need to assess its quality despite its impressive speed.
Favorite Tool: CTranslate2 is the preferred choice due to its speed and user-friendliness, supported by excellent documentation. It lacks distributed inference unlike vLLM.
vLLM Performance: vLLM is also fast but CTranslate outperforms it in speed. However, vLLM supports distributed inference, making it suitable for larger models.
Text Generation Inference (TGI): An acceptable choice for deploying HuggingFace LLMs traditionally, but not as swift as vLLM. Offers features like telemetry and HF ecosystem integration. Note that TGI’s licensing has become more restrictive as of 7/28/2023, potentially limiting certain commercial uses.

The History of Open-Source LLMs: Better Base Models (Part Two)

Publié le 2 août 2023 par loic

https://cameronrwolfe.substack.com/p/the-history-of-open-source-llms-better

Value of Open-source LLM Research: Aims to democratize influential technology; despite initial struggles and criticism, open-source LLMs gained popularity and significance.
Early Challenges: Initial open-source LLMs performed poorly and faced criticism, posing difficulties for advancement.
Transformative Research Line: Focuses on enhancing open-source LLMs, leading to high-performing pre-trained models accessible to all.
Significance of High-Performing Models: Creation of powerful, cost-effective pre-trained LLMs revolutionized research accessibility.
Series Overview: Part two of a three-part series on open-source LLM history. The first part explored initial open-source LLM attempts.
Study Focus: This overview delves into the most popular open-source base models, emphasizing pre-trained models not yet fine-tuned or aligned.
Future Exploration: Subsequent installment will discuss fine-tuning and alignment of models for diverse practical applications.

Advanced Prompt Engineering

Publié le 2 août 2023 par loic

https://cameronrwolfe.substack.com/p/advanced-prompt-engineering

The emergence of large language models (LLMs) has revolutionized problem-solving approaches. In the past, tasks like document reformatting or sentence classification necessitated creating specific computer programs. LLMs have transformed this process, enabling tasks to be accomplished through textual prompts. For instance, reformatting documents can be achieved by instructing an LLM. This shift was exemplified by GPT-3’s ability to achieve accurate results with minimal guidance.

As LLM research progressed, more sophisticated techniques emerged beyond basic prompting methods like zero/few-shot learning. Instruction-following LLMs (e.g., InstructGPT, ChatGPT) prompted investigations into tackling complex tasks. The goal was to extend LLMs beyond simple problems, requiring them to comprehend intricate instructions and execute multi-step reasoning. However, such challenges demand advanced prompting strategies due to their complexity.

Practical Prompt Engineering

Publié le 2 août 2023 par loic

https://cameronrwolfe.substack.com/p/practical-prompt-engineering-part

Prompt engineering: An empirical science focused on optimizing LLM (Large Language Model) performance through various prompting strategies.
Aims to understand prompting mechanics and employs techniques to enhance LLM capabilities.
Zero/few-shot learning: A fundamental technique where LLMs perform tasks with minimal or no training examples, showcasing their remarkable adaptability.
Instruction prompting: Another vital technique involving explicit instructions in prompts to guide LLM behavior.
Overview intends to impart practical insights and strategies for effective prompt engineering and LLM utilization.
Provides actionable tricks and takeaways for prompt engineers and LLM practitioners to enhance their effectiveness.

The History of Open-Source LLMs: Early Days (Part One)

Publié le 2 août 2023 par loic

https://cameronrwolfe.substack.com/p/the-history-of-open-source-llms-early

Language modeling research traces back to models like GPT, GPT-2, and pre-transformer methods such as ULMFit.
GPT-3’s proposal marked the initial rise in popularity by showcasing impressive few-shot learning through self-supervised pre-training and in-context learning.
The recognition of GPT-3 led to the creation of various large language models (LLMs), including InstructGPT and ChatGPT, sparking widespread interest in generative AI.
Early LLMs often remained closed source, limiting researchers’ understanding and improvement of their workings.
Open-source variants of popular language models began to emerge gradually, although they initially lagged behind proprietary models in performance.
These early open-source models laid the groundwork for increased transparency in LLM research and inspired the development of more potent subsequent models like Falcon and LLaMA-21.
The overview is part of a three-part series that delves into the history of open-source language models, exploring their beginnings, recent developments, and the application of imitation and alignment techniques to enhance their performance.

Large Transformer Model Inference Optimization

Publié le 2 août 2023 par loic

https://lilianweng.github.io/posts/2023-01-10-inference-optimization/#quantization

Large transformer models are mainstream nowadays, creating SoTA results for a variety of tasks. They are powerful but very expensive to train and use. The extremely high inference cost, in both time and memory, is a big bottleneck for adopting a powerful transformer for solving real-world tasks at scale.

Why is it hard to run inference for large transformer models? Besides the increasing size of SoTA models, there are two main factors contributing to the inference challenge (Pope et al. 2022):

Large memory footprint. Both model parameters and intermediate states are needed in memory at inference time. For example,
- The KV cache should be stored in memory during decoding time; E.g. For a batch size of 512 and context length of 2048, the KV cache totals 3TB, that is 3x the model size (!).
- Inference cost from the attention mechanism scales quadratically with input sequence length.
Low parallelizability. Inference generation is executed in an autoregressive fashion, making the decoding process hard to parallel.

In this post, we will look into several approaches for making transformer inference more efficient. Some are general network compression methods, while others are specific to transformer architecture.

Universal and Transferable Adversarial Attacks on Aligned Language Models

Publié le 2 août 2023 par loic

https://llm-attacks.org/

This research examines the safety of large language models (LLMs) such as ChatGPT, Bard, and Claude. It demonstrates the potential for automated creation of adversarial attacks, using character sequences added to user queries that manipulate the LLM into following harmful commands. Unlike traditional « jailbreaks, » these attacks are automated and can affect both open-source and closed-source chatbots. The study raises concerns about the effectiveness of mitigation measures and suggests that the challenges posed by adversarial behavior might persist due to the nature of deep learning models. The findings highlight the need for careful consideration of the safety implications as LLMs become more integrated into various applications.

Time Series Made Easy in Python: DARTS

Publié le 12 mars 2023 par loic

Darts is a Python library for user-friendly forecasting and anomaly detection on time series. It contains a variety of models, from classics such as ARIMA to deep neural networks.

Some of the key features of Darts include:

A simple and intuitive interface for defining and fitting models
Support for different types of time series data, including univariate, multivariate, and panel data
A wide range of built-in models, including ARIMA, Exponential Smoothing, Prophet, LSTM, and TCN
Tools for hyperparameter tuning and model selection, such as cross-validation and grid search
Visualization tools for exploring and analyzing time series data and model outputs

Library

Model	Univariate	Multivariate	Probabilistic	Multiple series (global)	Past-observed covariates	Future-known covariates	Static covariates	Reference
`ARIMA`	✅		✅			✅
`VARIMA`	✅	✅				✅
`AutoARIMA`	✅					✅
`StatsForecastAutoARIMA` (faster AutoARIMA)	✅		✅			✅		Nixtla’s statsforecast
`ExponentialSmoothing`	✅		✅
`StatsForecastETS`	✅					✅		Nixtla’s statsforecast
`BATS` and `TBATS`	✅		✅					TBATS paper
`Theta` and `FourTheta`	✅							Theta & 4 Theta
`Prophet` (see install notes)	✅		✅			✅		Prophet repo
`FFT` (Fast Fourier Transform)	✅
`KalmanForecaster` using the Kalman filter and N4SID for system identification	✅	✅	✅			✅		N4SID paper
`Croston` method	✅
`RegressionModel`; generic wrapper around any sklearn regression model	✅	✅		✅	✅	✅	✅
`RandomForest`	✅	✅		✅	✅	✅	✅
`LinearRegressionModel`	✅	✅	✅	✅	✅	✅	✅
`LightGBMModel`	✅	✅	✅	✅	✅	✅	✅
`CatBoostModel`	✅	✅	✅	✅	✅	✅	✅
`XGBModel`	✅	✅	✅	✅	✅	✅	✅
`RNNModel` (incl. LSTM and GRU); equivalent to DeepAR in its probabilistic version	✅	✅	✅	✅		✅		DeepAR paper
`BlockRNNModel` (incl. LSTM and GRU)	✅	✅	✅	✅	✅
`NBEATSModel`	✅	✅	✅	✅	✅			N-BEATS paper
`NHiTSModel`	✅	✅	✅	✅	✅			N-HiTS paper
`TCNModel`	✅	✅	✅	✅	✅			TCN paper, DeepTCN paper, blog post
`TransformerModel`	✅	✅	✅	✅	✅
`TFTModel` (Temporal Fusion Transformer)	✅	✅	✅	✅	✅	✅	✅	TFT paper, PyTorch Forecasting
`DLinearModel`	✅	✅	✅	✅	✅	✅	✅	DLinear paper
`NLinearModel`	✅	✅	✅	✅	✅	✅	✅	NLinear paper
Naive Baselines	✅	✅

Category Encoders

Publié le 11 mars 2023 par loic

A set of scikit-learn-style transformers for encoding categorical variables into numeric with different techniques.

Category Encoders is a Python library for encoding categorical variables for machine learning tasks. It is available on contrib.scikit-learn.org and extends the capabilities of scikit-learn’s preprocessing module.

The library provides several powerful encoding techniques for dealing with categorical data, including:

Ordinal encoding: maps categorical variables to integer values based on their order of appearance
One-hot encoding: creates a binary feature for each category in a variable
Binary encoding: maps each category to a binary code
Target encoding: encodes each category with the mean target value for that category
Hashing encoding: maps each category to a random index in a hash table

Category Encoders also supports a range of advanced features, such as handling missing values, combining multiple encoders, and applying encoders to specific subsets of features.

Overall, Category Encoders is a useful tool for preprocessing categorical data and improving the accuracy and performance of machine learning models.

L	M	M	J	V	S	D
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Deeplearning.fr

You have to learn the rules of the game. And then you have to play better than anyone else

Archives de l’auteur : loic

Fine-Tuning Embedding for RAG with Synthetic Data

Optimizing LLM latency

The History of Open-Source LLMs: Better Base Models (Part Two)

Advanced Prompt Engineering

Practical Prompt Engineering

The History of Open-Source LLMs: Early Days (Part One)

Large Transformer Model Inference Optimization

Universal and Transferable Adversarial Attacks on Aligned Language Models

Time Series Made Easy in Python: DARTS

Category Encoders