Visually Understanding UMAP

Publié le 25 octobre 2023 par loic

In this article, they explore dimensionality reduction, a valuable tool for machine learning practitioners aiming to analyze vast, high-dimensional datasets. While t-SNE is a commonly used technique for visualization, its efficacy diminishes with large datasets and mastering its application can be challenging.

UMAP, introduced by McInnes et al., presents several advantages over t-SNE, including enhanced speed and better preservation of a dataset’s global structure. This article delves into the theory behind UMAP, providing insights into its functionality, effective usage, and a performance comparison with t-SNE.

https://pair-code.github.io/understanding-umap/

Denoising Autoencoders for Tabular Data

Publié le 26 février 2023 par loic

Financial Explaining Anomalies

Initial paper :https://arxiv.org/pdf/2209.10658.pdf
Code: https://github.com/topics/denoising-autoencoders
Kaggle example : kaggle Notebook
Bundesbank (2023) use case: Bundesbank (2023) paper

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

Publié le 26 février 2023 par loic

The main idea in the paper is that the performance of regular Multi-layer Perceptron (MLP) can be significantly improved if we use Transformers to transforms regular categorical embeddings into contextual ones.

The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embed- dings of categorical features into robust contextual embed- dings to achieve higher prediction accuracy.