BILLION-SCALE TIME SERIES FOUNDATION MODELS From Princeton WITH MIXTURE OF EXPERTS
TIME-MOE is a scalable and unified architecture designed for pre-training large, capable forecasting foundation models while reducing inference costs. It addresses the limitations of current pre-trained time series models, which are often limited in scale and operate at high costs.
Key Features
- Sparse Mixture-of-Experts (MoE) Design: Enhances computational efficiency by activating only a subset of networks for each prediction.
- Scalability: Allows for effective scaling without a corresponding increase in inference costs.
- Flexibility: Supports flexible forecasting horizons with varying input context lengths.
Architecture
- Decoder-only transformer models
- Operates in an autoregressive manner
- Family of models scaling up to 2.4 billion parameters
Training Data
- Pre-trained on Time-300B dataset
- Spans over 9 domains
- Encompasses over 300 billion time points
Performance
- Achieves significantly improved forecasting precision
- Outperforms dense models with equivalent computation budgets or activated parameters
Applications
Positioned as a state-of-the-art solution for real-world time series forecasting challenges, offering superior capability, efficiency, and flexibility.
https://arxiv.org/pdf/2409.16040
Code: