TIME-MOE : Time series

BILLION-SCALE TIME SERIES FOUNDATION MODELS From Princeton WITH MIXTURE OF EXPERTS

TIME-MOE is a scalable and unified architecture designed for pre-training large, capable forecasting foundation models while reducing inference costs. It addresses the limitations of current pre-trained time series models, which are often limited in scale and operate at high costs.

Key Features

  • Sparse Mixture-of-Experts (MoE) Design: Enhances computational efficiency by activating only a subset of networks for each prediction.
  • Scalability: Allows for effective scaling without a corresponding increase in inference costs.
  • Flexibility: Supports flexible forecasting horizons with varying input context lengths.

Architecture

  • Decoder-only transformer models
  • Operates in an autoregressive manner
  • Family of models scaling up to 2.4 billion parameters

Training Data

  • Pre-trained on Time-300B dataset
  • Spans over 9 domains
  • Encompasses over 300 billion time points

Performance

  • Achieves significantly improved forecasting precision
  • Outperforms dense models with equivalent computation budgets or activated parameters

Applications

Positioned as a state-of-the-art solution for real-world time series forecasting challenges, offering superior capability, efficiency, and flexibility.

https://arxiv.org/pdf/2409.16040

Code:

https://github.com/Time-MoE/Time-MoE