21 | mars | 2024 | Deeplearning.fr

LLMLingua-2 is a novel compression technology developed by Microsoft Research, achieving state-of-the-art results with 8 times less GPU memory on tasks typically handled by models like GPT-4.
It introduces innovative approaches such as « Data Distillation, » « Bidirectional Token Classification, » and optimized compression objectives to efficiently compress prompts without losing key information.
The technology has shown superior performance across various language tasks and demonstrated remarkable generalization across different LLMs and languages, from GPT-3.5 to Mistral-7B and from English to Chinese.
Compared to existing prompt compression methods, LLMLingua-2 is 3 to 6 times faster, accelerates end-to-end inference by 1.6 to 2.9 times, and significantly reduces GPU memory usage by a factor of 8.
This advancement represents a significant step forward in making language AI more practical and scalable for real-world applications, demonstrating Microsoft Research’s leadership in the field.

sample

Chronos is a framework designed for pretrained probabilistic time series models.
It utilizes scaling and quantization to tokenize time series values into a fixed vocabulary.
Chronos trains transformer-based language model architectures (specifically, models from the T5 family with parameters ranging from 20M to 710M) using cross-entropy loss.
The models are pretrained on a mix of publicly available datasets and a synthetic dataset generated via Gaussian processes, enhancing generalization.
In a comprehensive benchmark involving 42 datasets, including both classical local models and deep learning approaches, Chronos models:
(a) significantly outperform other methods on datasets included in the training corpus;
(b) show comparable or occasionally superior zero-shot performance on new datasets compared to methods trained specifically on those datasets.
These results demonstrate the potential of pretrained models to leverage time series data across various domains for improving zero-shot accuracy on unseen forecasting tasks, suggesting a simplified approach to forecasting pipelines.

Deeplearning.fr