TIME-MOE : Time series | Deeplearning.fr

BILLION-SCALE TIME SERIES FOUNDATION MODELS From Princeton WITH MIXTURE OF EXPERTS

TIME-MOE is a scalable and unified architecture designed for pre-training large, capable forecasting foundation models while reducing inference costs. It addresses the limitations of current pre-trained time series models, which are often limited in scale and operate at high costs.

Key Features

Sparse Mixture-of-Experts (MoE) Design: Enhances computational efficiency by activating only a subset of networks for each prediction.
Scalability: Allows for effective scaling without a corresponding increase in inference costs.
Flexibility: Supports flexible forecasting horizons with varying input context lengths.

Architecture

Decoder-only transformer models
Operates in an autoregressive manner
Family of models scaling up to 2.4 billion parameters

Training Data

Pre-trained on Time-300B dataset
Spans over 9 domains
Encompasses over 300 billion time points

Performance

Achieves significantly improved forecasting precision
Outperforms dense models with equivalent computation budgets or activated parameters

Applications

Positioned as a state-of-the-art solution for real-world time series forecasting challenges, offering superior capability, efficiency, and flexibility.

https://arxiv.org/pdf/2409.16040

Code:

https://github.com/Time-MoE/Time-MoE