TSMixer: Revolutionizing Time Series Forecasting with MLP Architecture


TSMixer represents a significant advancement in deep learning forecasting models, offering a unique combination of lightweight design and high accuracy.

Here’s a comprehensive analysis of this innovative model:


Core Architecture
TSMixer employs a dual-mixing mechanism that processes data in two distinct ways:
• Time Mixing: Processes sequences across the temporal dimension using MLPs
• Feature Mixing: Handles data across the feature dimension
The model’s architecture includes multiple blocks of time-feature layers that can be stacked for enhanced performance, with a final temporal projection layer that maps sequences from context length to prediction length.


Key Innovations
Normalization Techniques
TSMixer implements three sophisticated normalization approaches:
• Batch Normalization: Normalizes across batch and time dimensions
• Layer Normalization: Works across features and time dimensions
• Reversible Instance Normalization (RevIN): Handles temporal characteristics while preserving sequence properties


Model Variants
Three distinct versions exist, each serving different purposes:
1. TMix-Only: A simplified version without feature-mixing
2. Standard TSMixer: Includes cross-variate MLPs
3. TSMixer-Ext: The most comprehensive variant, incorporating auxiliary information
Performance Advantages
The model demonstrates several notable strengths:
• Superior Long-Term Forecasting: Effectively handles prediction horizons up to 720 data points
• Scalability: Shows consistent improvement with larger lookback windows
• Versatility: Particularly effective in retail forecasting and complex datasets with interdependencies


Practical Applications
TSMixer has proven particularly effective in:
• Retail forecasting
• Demand planning
• Financial markets
• Complex multivariate time series analysis
The model’s success in benchmarks, particularly on the M5 Walmart dataset, demonstrates its practical utility in real-world applications.

VisionTS: Revolutionizing Time Series Forecasting with Image-Based Models

Challenges in Time Series Forecasting Models

The biggest challenge in building a pre-trained model for time series is finding high-quality and diverse data. This difficulty is at the core of developing effective forecasting models.

Main Approaches

Two primary approaches are used to build a fundamental forecasting model:

  1. Adapting an LLM: This method involves repurposing a pre-trained language model like GPT-4 or Llama by adapting it to time series tasks.
  2. Building from Scratch: This approach involves creating a vast time series dataset to pre-train a model, hoping it will generalize to new data.

Results and Limitations

The second approach has proven more effective, as evidenced by models such as MOIRAI, TimesFM, and TTM. However, these models follow scaling laws, and their performance heavily depends on the availability of extensive time series data, which brings us back to the initial challenge.

Innovation: Using Images

Faced with these limitations, an innovative approach was explored: using a different modality, namely images. Although counterintuitive, this method has produced groundbreaking results, opening new perspectives in the field of time series forecasting.

VisionTS: A New Paradigm

VisionTS represents a novel approach that leverages the power of image-based models for time series forecasting. This method transforms time series data into images, allowing the use of advanced computer vision techniques to predict future values.

Advantages of Image-Based Forecasting

Using images for time series forecasting offers several advantages:

  • Access to a vast pool of pre-trained image models
  • Ability to capture complex patterns and relationships in data
  • Potential for transfer learning from diverse image datasets

Future Implications

The success of VisionTS suggests a promising direction for future research in time series forecasting. It demonstrates the potential of cross-modal learning and opens up new possibilities for improving prediction accuracy and generalization in various domains.

paper:

https://arxiv.org/pdf/2408.17253

Code:

https://github.com/Keytoyze/VisionTS

TIME-MOE : Time series

BILLION-SCALE TIME SERIES FOUNDATION MODELS From Princeton WITH MIXTURE OF EXPERTS

TIME-MOE is a scalable and unified architecture designed for pre-training large, capable forecasting foundation models while reducing inference costs. It addresses the limitations of current pre-trained time series models, which are often limited in scale and operate at high costs.

Key Features

  • Sparse Mixture-of-Experts (MoE) Design: Enhances computational efficiency by activating only a subset of networks for each prediction.
  • Scalability: Allows for effective scaling without a corresponding increase in inference costs.
  • Flexibility: Supports flexible forecasting horizons with varying input context lengths.

Architecture

  • Decoder-only transformer models
  • Operates in an autoregressive manner
  • Family of models scaling up to 2.4 billion parameters

Training Data

  • Pre-trained on Time-300B dataset
  • Spans over 9 domains
  • Encompasses over 300 billion time points

Performance

  • Achieves significantly improved forecasting precision
  • Outperforms dense models with equivalent computation budgets or activated parameters

Applications

Positioned as a state-of-the-art solution for real-world time series forecasting challenges, offering superior capability, efficiency, and flexibility.

https://arxiv.org/pdf/2409.16040

Code:

https://github.com/Time-MoE/Time-MoE

AutoGluon: Time Series Forecasting

AutoGluon can forecast the future values of multiple time series given the historical data and other related covariates. A single call to AutoGluon TimeSeriesPredictor’s fit() method trains multiple models to generate accurate probabilistic forecasts, and does not require you to manually deal with cumbersome issues like model selection and hyperparameter tuning.

https://auto.gluon.ai/stable/tutorials/timeseries/index.html

https://raw.githubusercontent.com/Innixma/autogluon-doc-utils/main/docs/cheatsheets/stable/autogluon-cheat-sheet.jpeg

TinyTimeMixers

TinyTimeMixers (TTMs) are compact pre-trained models for Multivariate Time-Series Forecasting, open-sourced by IBM Research. With less than 1 Million parameters, TTM introduces the notion of the first-ever “tiny” pre-trained models for Time-Series Forecasting.

https://huggingface.co/ibm/TTM

TTM outperforms several popular benchmarks demanding billions of parameters in zero-shot and few-shot forecasting. TTMs are lightweight forecasters, pre-trained on publicly available time series data with various augmentations. TTM provides state-of-the-art zero-shot forecasts and can easily be fine-tuned for multi-variate forecasts with just 5% of the training data to be competitive. 

Chronos: Learning the Language of Time Series

  • Chronos is a framework designed for pretrained probabilistic time series models.
  • It utilizes scaling and quantization to tokenize time series values into a fixed vocabulary.
  • Chronos trains transformer-based language model architectures (specifically, models from the T5 family with parameters ranging from 20M to 710M) using cross-entropy loss.
  • The models are pretrained on a mix of publicly available datasets and a synthetic dataset generated via Gaussian processes, enhancing generalization.
  • In a comprehensive benchmark involving 42 datasets, including both classical local models and deep learning approaches, Chronos models:
  • (a) significantly outperform other methods on datasets included in the training corpus;
  • (b) show comparable or occasionally superior zero-shot performance on new datasets compared to methods trained specifically on those datasets.
  • These results demonstrate the potential of pretrained models to leverage time series data across various domains for improving zero-shot accuracy on unseen forecasting tasks, suggesting a simplified approach to forecasting pipelines.

https://arxiv.org/pdf/2403.07815.pdf

https://github.com/amazon-science/chronos-forecasting/

Unified Time Series Model

UniTS is a unified time series model that can process various tasks across multiple domains with shared parameters and does not have any task-specific modules.

Foundation models, especially LLMs, are profoundly transforming deep learning. Instead of training many task-specific models, we can adapt a single pretrained model to many tasks via few-shot prompting or fine-tuning. However, current foundation models apply to sequence data but not to time series, which present unique challenges due to the inherent diverse and multi-domain time series datasets, diverging task specifications across forecasting, classification and other types of tasks, and the apparent need for task-specialized models. 

We developed UniTS, a unified time series model that supports a universal task specification, accommodating classification, forecasting, imputation, and anomaly detection tasks. This is achieved through a novel unified network backbone, which incorporates sequence and variable attention along with a dynamic linear operator and is trained as a unified model. 

Across 38 multi-domain datasets, UniTS demonstrates superior performance compared to task-specific models and repurposed natural language-based LLMs. UniTS exhibits remarkable zero-shot, few-shot, and prompt learning capabilities when evaluated on new data domains and tasks. We will release the source code and datasets.

https://arxiv.org/pdf/2403.00131v1.pdf

https://zitniklab.hms.harvard.edu/projects/UniTS/

https://github.com/mims-harvard/UniTS

Unified Training of Universal Time Series Forecasting Transformers

  • Deep learning for time series forecasting traditionally uses a one-model-per-dataset approach, limiting potential advancements.
  • Universal forecasting introduces the idea of pre-training a single Large Time Series Model on a vast collection of datasets for diverse tasks.
  • Challenges in creating such a model include: cross-frequency learning, handling multivariate series with arbitrary variates, and varying distributional properties of large-scale data.
  • To overcome these challenges, novel enhancements to the time series Transformer architecture are introduced, creating the Masked EncOder-based UnIveRsAl TIme Series Forecasting Transformer (MOIRAI).
  • MOIRAI is trained on the Large-scale Open Time Series Archive (LOTSA), which contains over 27 billion observations across nine domains.
  • MOIRAI demonstrates competitive or superior performance as a zero-shot forecaster compared to full-shot models.

https://arxiv.org/pdf/2402.02592.pdf?utm_source=substack&utm_medium=email