- Chronos is a framework designed for pretrained probabilistic time series models.
- It utilizes scaling and quantization to tokenize time series values into a fixed vocabulary.
- Chronos trains transformer-based language model architectures (specifically, models from the T5 family with parameters ranging from 20M to 710M) using cross-entropy loss.
- The models are pretrained on a mix of publicly available datasets and a synthetic dataset generated via Gaussian processes, enhancing generalization.
- In a comprehensive benchmark involving 42 datasets, including both classical local models and deep learning approaches, Chronos models:
- (a) significantly outperform other methods on datasets included in the training corpus;
- (b) show comparable or occasionally superior zero-shot performance on new datasets compared to methods trained specifically on those datasets.
- These results demonstrate the potential of pretrained models to leverage time series data across various domains for improving zero-shot accuracy on unseen forecasting tasks, suggesting a simplified approach to forecasting pipelines.
Archives de catégorie : Tabular data
Category Encoders
A set of scikit-learn-style transformers for encoding categorical variables into numeric with different techniques.
Category Encoders is a Python library for encoding categorical variables for machine learning tasks. It is available on contrib.scikit-learn.org and extends the capabilities of scikit-learn’s preprocessing module.
The library provides several powerful encoding techniques for dealing with categorical data, including:
- Ordinal encoding: maps categorical variables to integer values based on their order of appearance
- One-hot encoding: creates a binary feature for each category in a variable
- Binary encoding: maps each category to a binary code
- Target encoding: encodes each category with the mean target value for that category
- Hashing encoding: maps each category to a random index in a hash table
Category Encoders also supports a range of advanced features, such as handling missing values, combining multiple encoders, and applying encoders to specific subsets of features.
Overall, Category Encoders is a useful tool for preprocessing categorical data and improving the accuracy and performance of machine learning models.
- Backward Difference Coding
- BaseN
- Binary
- CatBoost Encoder
- Count Encoder
- Generalized Linear Mixed Model Encoder
- Gray
- Hashing
- Helmert Coding
- James-Stein Encoder
- Leave One Out
- M-estimate
- One Hot
- Ordinal
- Polynomial Coding
- Quantile Encoder
- Sum Coding
- Summary Encoder
- Target Encoder
- Weight of Evidence
- Wrappers
Denoising Autoencoders for Tabular Data
Financial Explaining Anomalies
- Initial paper :https://arxiv.org/pdf/2209.10658.pdf
- Code: https://github.com/topics/denoising-autoencoders
- Kaggle example : kaggle Notebook
- Bundesbank (2023) use case: Bundesbank (2023) paper
Revisiting Deep Learning Models for Tabular Data
- Paper: https://arxiv.org/pdf/2106.11959v2.pdf
- Code Pytorch: https://github.com/lucidrains/tab-transformer-pytorch
- Library bis: Implementation of TabTransformer in TensorFlow and Keras
- Kaggle example: kaggle tabtransformer
- Notebook: Notebook in keras
- Keras implementation code :Keras Implementation
- Keras code: keras-team code
TabTransformer: Tabular Data Modeling Using Contextual Embeddings
The main idea in the paper is that the performance of regular Multi-layer Perceptron (MLP) can be significantly improved if we use Transformers to transforms regular categorical embeddings into contextual ones.
The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embed- dings of categorical features into robust contextual embed- dings to achieve higher prediction accuracy.
Missing data Imputation
Are deep learning models superior ?