This research examines the safety of large language models (LLMs) such as ChatGPT, Bard, and Claude. It demonstrates the potential for automated creation of adversarial attacks, using character sequences added to user queries that manipulate the LLM into following harmful commands. Unlike traditional « jailbreaks, » these attacks are automated and can affect both open-source and closed-source chatbots. The study raises concerns about the effectiveness of mitigation measures and suggests that the challenges posed by adversarial behavior might persist due to the nature of deep learning models. The findings highlight the need for careful consideration of the safety implications as LLMs become more integrated into various applications.
Time Series Made Easy in Python: DARTS
Darts is a Python library for user-friendly forecasting and anomaly detection on time series. It contains a variety of models, from classics such as ARIMA to deep neural networks.
Some of the key features of Darts include:
- A simple and intuitive interface for defining and fitting models
- Support for different types of time series data, including univariate, multivariate, and panel data
- A wide range of built-in models, including ARIMA, Exponential Smoothing, Prophet, LSTM, and TCN
- Tools for hyperparameter tuning and model selection, such as cross-validation and grid search
- Visualization tools for exploring and analyzing time series data and model outputs
| Model | Univariate | Multivariate | Probabilistic | Multiple series (global) | Past-observed covariates | Future-known covariates | Static covariates | Reference |
|---|---|---|---|---|---|---|---|---|
ARIMA | ✅ | ✅ | ✅ | |||||
VARIMA | ✅ | ✅ | ✅ | |||||
AutoARIMA | ✅ | ✅ | ||||||
StatsForecastAutoARIMA (faster AutoARIMA) | ✅ | ✅ | ✅ | Nixtla’s statsforecast | ||||
ExponentialSmoothing | ✅ | ✅ | ||||||
StatsForecastETS | ✅ | ✅ | Nixtla’s statsforecast | |||||
BATS and TBATS | ✅ | ✅ | TBATS paper | |||||
Theta and FourTheta | ✅ | Theta & 4 Theta | ||||||
Prophet (see install notes) | ✅ | ✅ | ✅ | Prophet repo | ||||
FFT (Fast Fourier Transform) | ✅ | |||||||
KalmanForecaster using the Kalman filter and N4SID for system identification | ✅ | ✅ | ✅ | ✅ | N4SID paper | |||
Croston method | ✅ | |||||||
RegressionModel; generic wrapper around any sklearn regression model | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||
RandomForest | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||
LinearRegressionModel | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
LightGBMModel | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
CatBoostModel | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
XGBModel | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
RNNModel (incl. LSTM and GRU); equivalent to DeepAR in its probabilistic version | ✅ | ✅ | ✅ | ✅ | ✅ | DeepAR paper | ||
BlockRNNModel (incl. LSTM and GRU) | ✅ | ✅ | ✅ | ✅ | ✅ | |||
NBEATSModel | ✅ | ✅ | ✅ | ✅ | ✅ | N-BEATS paper | ||
NHiTSModel | ✅ | ✅ | ✅ | ✅ | ✅ | N-HiTS paper | ||
TCNModel | ✅ | ✅ | ✅ | ✅ | ✅ | TCN paper, DeepTCN paper, blog post | ||
TransformerModel | ✅ | ✅ | ✅ | ✅ | ✅ | |||
TFTModel (Temporal Fusion Transformer) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | TFT paper, PyTorch Forecasting |
DLinearModel | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | DLinear paper |
NLinearModel | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | NLinear paper |
| Naive Baselines | ✅ | ✅ |
Category Encoders
A set of scikit-learn-style transformers for encoding categorical variables into numeric with different techniques.

Category Encoders is a Python library for encoding categorical variables for machine learning tasks. It is available on contrib.scikit-learn.org and extends the capabilities of scikit-learn’s preprocessing module.
The library provides several powerful encoding techniques for dealing with categorical data, including:
- Ordinal encoding: maps categorical variables to integer values based on their order of appearance
- One-hot encoding: creates a binary feature for each category in a variable
- Binary encoding: maps each category to a binary code
- Target encoding: encodes each category with the mean target value for that category
- Hashing encoding: maps each category to a random index in a hash table
Category Encoders also supports a range of advanced features, such as handling missing values, combining multiple encoders, and applying encoders to specific subsets of features.
Overall, Category Encoders is a useful tool for preprocessing categorical data and improving the accuracy and performance of machine learning models.
- Backward Difference Coding
- BaseN
- Binary
- CatBoost Encoder
- Count Encoder
- Generalized Linear Mixed Model Encoder
- Gray
- Hashing
- Helmert Coding
- James-Stein Encoder
- Leave One Out
- M-estimate
- One Hot
- Ordinal
- Polynomial Coding
- Quantile Encoder
- Sum Coding
- Summary Encoder
- Target Encoder
- Weight of Evidence
- Wrappers
Cleaning labels: Cleanlab
cleanlab automatically detects problems in a ML dataset. This data-centric AI package facilitates machine learning with messy, real-world data by providing clean labels for robust training and flagging errors in your data

Paper: https://arxiv.org/pdf/1911.00068.pdf
Code : Code
Yellowbrick: Machine Learning Visualization

Feature Visualization
- Rank Features: pairwise ranking of features to detect relationships
- Parallel Coordinates: horizontal visualization of instances
- Radial Visualization: separation of instances around a circular plot
- PCA Projection: projection of instances based on principal components
- Manifold Visualization: high dimensional visualization with manifold learning
- Joint Plots: direct data visualization with feature selection
Classification Visualization
- Class Prediction Error: shows error and support in classification
- Classification Report: visual representation of precision, recall, and F1
- ROC/AUC Curves: receiver operator characteristics and area under the curve
- Precision-Recall Curves: precision vs recall for different probability thresholds
- Confusion Matrices: visual description of class decision making
- Discrimination Threshold: find a threshold that best separates binary classes
Regression Visualization
- Prediction Error Plot: find model breakdowns along the domain of the target
- Residuals Plot: show the difference in residuals of training and test data
- Alpha Selection: show how the choice of alpha influences regularization
- Cook’s Distance: show the influence of instances on linear regression
Clustering Visualization
- K-Elbow Plot: select k using the elbow method and various metrics
- Silhouette Plot: select k by visualizing silhouette coefficient values
- Intercluster Distance Maps: show relative distance and size/importance of clusters
Model Selection Visualization
- Validation Curve: tune a model with respect to a single hyperparameter
- Learning Curve: show if a model might benefit from more data or less complexity
- Feature Importances: rank features by importance or linear coefficients for a specific model
- Recursive Feature Elimination: find the best subset of features based on importance
Target Visualization
- Balanced Binning Reference: generate a histogram with vertical lines showing the recommended value point to bin the data into evenly distributed bins
- Class Balance: see how the distribution of classes affects the model
- Feature Correlation: display the correlation between features and dependent variables
Text Visualization
- Term Frequency: visualize the frequency distribution of terms in the corpus
- t-SNE Corpus Visualization: use stochastic neighbor embedding to project documents
- Dispersion Plot: visualize how key terms are dispersed throughout a corpus
- UMAP Corpus Visualization: plot similar documents closer together to discover clusters
- PosTag Visualization: plot the counts of different parts-of-speech throughout a tagged corpus
AI Factory
Text using Chatgpt, image from Dall-E, text to speech from D-ID
Denoising Autoencoders for Tabular Data
Financial Explaining Anomalies

- Initial paper :https://arxiv.org/pdf/2209.10658.pdf
- Code: https://github.com/topics/denoising-autoencoders
- Kaggle example : kaggle Notebook
- Bundesbank (2023) use case: Bundesbank (2023) paper
Revisiting Deep Learning Models for Tabular Data

- Paper: https://arxiv.org/pdf/2106.11959v2.pdf
- Code Pytorch: https://github.com/lucidrains/tab-transformer-pytorch
- Library bis: Implementation of TabTransformer in TensorFlow and Keras
- Kaggle example: kaggle tabtransformer
- Notebook: Notebook in keras
- Keras implementation code :Keras Implementation
- Keras code: keras-team code
TabTransformer: Tabular Data Modeling Using Contextual Embeddings
The main idea in the paper is that the performance of regular Multi-layer Perceptron (MLP) can be significantly improved if we use Transformers to transforms regular categorical embeddings into contextual ones.
The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embed- dings of categorical features into robust contextual embed- dings to achieve higher prediction accuracy.

