Revolutionizing AI Efficiency: How Microsoft’s LLMLingua-2 is Changing the Game with 8x Less Memory

  • LLMLingua-2 is a novel compression technology developed by Microsoft Research, achieving state-of-the-art results with 8 times less GPU memory on tasks typically handled by models like GPT-4.
  • It introduces innovative approaches such as « Data Distillation, » « Bidirectional Token Classification, » and optimized compression objectives to efficiently compress prompts without losing key information.
  • The technology has shown superior performance across various language tasks and demonstrated remarkable generalization across different LLMs and languages, from GPT-3.5 to Mistral-7B and from English to Chinese.
  • Compared to existing prompt compression methods, LLMLingua-2 is 3 to 6 times faster, accelerates end-to-end inference by 1.6 to 2.9 times, and significantly reduces GPU memory usage by a factor of 8.
  • This advancement represents a significant step forward in making language AI more practical and scalable for real-world applications, demonstrating Microsoft Research’s leadership in the field.

https://arxiv.org/pdf/2403.12968.pdf

sample

https://huggingface.co/microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank

Chronos: Learning the Language of Time Series

  • Chronos is a framework designed for pretrained probabilistic time series models.
  • It utilizes scaling and quantization to tokenize time series values into a fixed vocabulary.
  • Chronos trains transformer-based language model architectures (specifically, models from the T5 family with parameters ranging from 20M to 710M) using cross-entropy loss.
  • The models are pretrained on a mix of publicly available datasets and a synthetic dataset generated via Gaussian processes, enhancing generalization.
  • In a comprehensive benchmark involving 42 datasets, including both classical local models and deep learning approaches, Chronos models:
  • (a) significantly outperform other methods on datasets included in the training corpus;
  • (b) show comparable or occasionally superior zero-shot performance on new datasets compared to methods trained specifically on those datasets.
  • These results demonstrate the potential of pretrained models to leverage time series data across various domains for improving zero-shot accuracy on unseen forecasting tasks, suggesting a simplified approach to forecasting pipelines.

https://arxiv.org/pdf/2403.07815.pdf

https://github.com/amazon-science/chronos-forecasting/

Unified Time Series Model

UniTS is a unified time series model that can process various tasks across multiple domains with shared parameters and does not have any task-specific modules.

Foundation models, especially LLMs, are profoundly transforming deep learning. Instead of training many task-specific models, we can adapt a single pretrained model to many tasks via few-shot prompting or fine-tuning. However, current foundation models apply to sequence data but not to time series, which present unique challenges due to the inherent diverse and multi-domain time series datasets, diverging task specifications across forecasting, classification and other types of tasks, and the apparent need for task-specialized models. 

We developed UniTS, a unified time series model that supports a universal task specification, accommodating classification, forecasting, imputation, and anomaly detection tasks. This is achieved through a novel unified network backbone, which incorporates sequence and variable attention along with a dynamic linear operator and is trained as a unified model. 

Across 38 multi-domain datasets, UniTS demonstrates superior performance compared to task-specific models and repurposed natural language-based LLMs. UniTS exhibits remarkable zero-shot, few-shot, and prompt learning capabilities when evaluated on new data domains and tasks. We will release the source code and datasets.

https://arxiv.org/pdf/2403.00131v1.pdf

https://zitniklab.hms.harvard.edu/projects/UniTS/

https://github.com/mims-harvard/UniTS

Unified Training of Universal Time Series Forecasting Transformers

  • Deep learning for time series forecasting traditionally uses a one-model-per-dataset approach, limiting potential advancements.
  • Universal forecasting introduces the idea of pre-training a single Large Time Series Model on a vast collection of datasets for diverse tasks.
  • Challenges in creating such a model include: cross-frequency learning, handling multivariate series with arbitrary variates, and varying distributional properties of large-scale data.
  • To overcome these challenges, novel enhancements to the time series Transformer architecture are introduced, creating the Masked EncOder-based UnIveRsAl TIme Series Forecasting Transformer (MOIRAI).
  • MOIRAI is trained on the Large-scale Open Time Series Archive (LOTSA), which contains over 27 billion observations across nine domains.
  • MOIRAI demonstrates competitive or superior performance as a zero-shot forecaster compared to full-shot models.

https://arxiv.org/pdf/2402.02592.pdf?utm_source=substack&utm_medium=email

GPT in 60 Lines of NumPy

In this post, they implement a GPT from scratch in just 60 lines of numpy. We’ll then load the trained GPT-2 model weights released by OpenAI into our implementation and generate some text.

Note:

  • This post assumes familiarity with Python, NumPy, and some basic experience training neural networks.
  • This implementation is missing tons of features on purpose to keep it as simple as possible while remaining complete. The goal is to provide a simple yet complete technical introduction to the GPT as an educational tool.
  • The GPT architecture is just one small part of what makes LLMs what they are today.[1].
  • All the code for this blog post can be found at github.com/jaymody/picoGPT.
  • Hacker news thread
  • Chinese translation

Text splitting

Large language models (LLMs) can be used for many tasks, but often have a limited context size that can be smaller than documents you might want to use. To use documents of larger length, you often have to split your text into chunks to fit within this context size.

This crate provides methods for splitting longer pieces of text into smaller chunks, aiming to maximize a desired chunk size, but still splitting at semantically sensible boundaries whenever possible.

Levels Of Text Splitting

Semantic text splitting library

https://github.com/benbrandt/text-splitter

Chunks Vizualizer

https://chunkviz.up.railway.app/

Renumics Spotlight

Spotlight helps you to understand unstructured datasets fast. You can create interactive visualizations from your dataframe with just a few lines of code. You can also leverage data enrichments (e.g. embeddings, prediction, uncertainties) to identify critical clusters in your data.

https://spotlight.renumics.com/

Revolutionizing AI Reading Comprehension: ReadAgent’s Breakthrough in Handling Documents with 20 Million Tokens

  • Introduction to ReadAgent by Google DeepMind
  • Development of ReadAgent, an AI capable of understanding long texts beyond the limits of its language model.
  • Utilizes a human-like reading strategy to comprehend complex documents.
  • Challenges Faced by Language Models
  • Context length limitation: Fixed token processing capacity leading to performance decline.
  • Ineffective context usage: Decreased comprehension with increasing text length.
  • Features of ReadAgent
  • Mimics human reading by forming and using « gist memories » of texts.
  • Breaks down texts into smaller « episodes » and generates gist memories for each.
  • Looks up relevant episodes when needed for answering questions.
  • Performance Enhancements
  • Capable of understanding documents « 20 times longer » than its base language model.
  • Shows improved performance on long document question answering datasets:
    • QuALITY: Accuracy improved from 85.8% to 86.9%.
    • NarrativeQA: Rating increased by 13-32% over baselines.
    • QMSum: Rating improved from 44.96% to 49.58%.
  • Potential Applications
  • Legal contract review, scientific literature analysis, customer support, financial report summarization, automated online course creation.
  • Indicates the future potential of AI in mastering lengthy real-world documents through human-like reading strategies.

https://read-agent.github.io/

Publié dans LLM | Marqué avec

DoRA: Weight-Decomposed Low-Rank Adaptation

  • Objective Exploration: Investigates the disparities between full fine-tuning (FT) and LoRA through a novel weight decomposition analysis.
  • Innovative Method: Introduces Weight-Decomposed LowRank Adaptation (DoRA), which splits pre-trained weights into magnitude and direction for fine-tuning.
  • Strategic Approach: Employs LoRA for directional updates, significantly reducing the number of trainable parameters.
  • Enhanced Performance: By adopting DoRA, it improves learning capacity and training stability of LoRA, without extra inference costs.
  • Proven Superiority: Demonstrates that DoRA outperforms LoRA in fine-tuning LLAMA, LLaVA, and VL-BART on tasks like commonsense reasoning, visual instruction tuning, and image/video-text understanding.
  • https://arxiv.org/abs/2402.09353

https://github.com/catid/dora

Bunkatopics

Bunkatopics is a package designed for Data Cleaning, Topic Modeling Visualization and Frame Analysis. Its primary goal is to assist developers in gaining insights from unstructured data, potentially facilitating data cleaning and optimizing LLMs through fine-tuning processes. Bunkatopics is constructed using well-known libraries like langchain, chroma, and transformers, enabling seamless integration into various environments.

https://github.com/charlesdedampierre/BunkaTopics?tab=readme-ov-file