Revolutionizing AI Efficiency: How Microsoft’s LLMLingua-2 is Changing the Game with 8x Less Memory

  • LLMLingua-2 is a novel compression technology developed by Microsoft Research, achieving state-of-the-art results with 8 times less GPU memory on tasks typically handled by models like GPT-4.
  • It introduces innovative approaches such as « Data Distillation, » « Bidirectional Token Classification, » and optimized compression objectives to efficiently compress prompts without losing key information.
  • The technology has shown superior performance across various language tasks and demonstrated remarkable generalization across different LLMs and languages, from GPT-3.5 to Mistral-7B and from English to Chinese.
  • Compared to existing prompt compression methods, LLMLingua-2 is 3 to 6 times faster, accelerates end-to-end inference by 1.6 to 2.9 times, and significantly reduces GPU memory usage by a factor of 8.
  • This advancement represents a significant step forward in making language AI more practical and scalable for real-world applications, demonstrating Microsoft Research’s leadership in the field.

https://arxiv.org/pdf/2403.12968.pdf

sample

https://huggingface.co/microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank

Chronos: Learning the Language of Time Series

  • Chronos is a framework designed for pretrained probabilistic time series models.
  • It utilizes scaling and quantization to tokenize time series values into a fixed vocabulary.
  • Chronos trains transformer-based language model architectures (specifically, models from the T5 family with parameters ranging from 20M to 710M) using cross-entropy loss.
  • The models are pretrained on a mix of publicly available datasets and a synthetic dataset generated via Gaussian processes, enhancing generalization.
  • In a comprehensive benchmark involving 42 datasets, including both classical local models and deep learning approaches, Chronos models:
  • (a) significantly outperform other methods on datasets included in the training corpus;
  • (b) show comparable or occasionally superior zero-shot performance on new datasets compared to methods trained specifically on those datasets.
  • These results demonstrate the potential of pretrained models to leverage time series data across various domains for improving zero-shot accuracy on unseen forecasting tasks, suggesting a simplified approach to forecasting pipelines.

https://arxiv.org/pdf/2403.07815.pdf

https://github.com/amazon-science/chronos-forecasting/