Time Series Made Easy in Python: DARTS

Publié le 12 mars 2023 par loic

Darts is a Python library for user-friendly forecasting and anomaly detection on time series. It contains a variety of models, from classics such as ARIMA to deep neural networks.

Some of the key features of Darts include:

A simple and intuitive interface for defining and fitting models
Support for different types of time series data, including univariate, multivariate, and panel data
A wide range of built-in models, including ARIMA, Exponential Smoothing, Prophet, LSTM, and TCN
Tools for hyperparameter tuning and model selection, such as cross-validation and grid search
Visualization tools for exploring and analyzing time series data and model outputs

Library

Model	Univariate	Multivariate	Probabilistic	Multiple series (global)	Past-observed covariates	Future-known covariates	Static covariates	Reference
`ARIMA`	✅		✅			✅
`VARIMA`	✅	✅				✅
`AutoARIMA`	✅					✅
`StatsForecastAutoARIMA` (faster AutoARIMA)	✅		✅			✅		Nixtla’s statsforecast
`ExponentialSmoothing`	✅		✅
`StatsForecastETS`	✅					✅		Nixtla’s statsforecast
`BATS` and `TBATS`	✅		✅					TBATS paper
`Theta` and `FourTheta`	✅							Theta & 4 Theta
`Prophet` (see install notes)	✅		✅			✅		Prophet repo
`FFT` (Fast Fourier Transform)	✅
`KalmanForecaster` using the Kalman filter and N4SID for system identification	✅	✅	✅			✅		N4SID paper
`Croston` method	✅
`RegressionModel`; generic wrapper around any sklearn regression model	✅	✅		✅	✅	✅	✅
`RandomForest`	✅	✅		✅	✅	✅	✅
`LinearRegressionModel`	✅	✅	✅	✅	✅	✅	✅
`LightGBMModel`	✅	✅	✅	✅	✅	✅	✅
`CatBoostModel`	✅	✅	✅	✅	✅	✅	✅
`XGBModel`	✅	✅	✅	✅	✅	✅	✅
`RNNModel` (incl. LSTM and GRU); equivalent to DeepAR in its probabilistic version	✅	✅	✅	✅		✅		DeepAR paper
`BlockRNNModel` (incl. LSTM and GRU)	✅	✅	✅	✅	✅
`NBEATSModel`	✅	✅	✅	✅	✅			N-BEATS paper
`NHiTSModel`	✅	✅	✅	✅	✅			N-HiTS paper
`TCNModel`	✅	✅	✅	✅	✅			TCN paper, DeepTCN paper, blog post
`TransformerModel`	✅	✅	✅	✅	✅
`TFTModel` (Temporal Fusion Transformer)	✅	✅	✅	✅	✅	✅	✅	TFT paper, PyTorch Forecasting
`DLinearModel`	✅	✅	✅	✅	✅	✅	✅	DLinear paper
`NLinearModel`	✅	✅	✅	✅	✅	✅	✅	NLinear paper
Naive Baselines	✅	✅

Category Encoders

Publié le 11 mars 2023 par loic

A set of scikit-learn-style transformers for encoding categorical variables into numeric with different techniques.

Category Encoders is a Python library for encoding categorical variables for machine learning tasks. It is available on contrib.scikit-learn.org and extends the capabilities of scikit-learn’s preprocessing module.

The library provides several powerful encoding techniques for dealing with categorical data, including:

Ordinal encoding: maps categorical variables to integer values based on their order of appearance
One-hot encoding: creates a binary feature for each category in a variable
Binary encoding: maps each category to a binary code
Target encoding: encodes each category with the mean target value for that category
Hashing encoding: maps each category to a random index in a hash table

Category Encoders also supports a range of advanced features, such as handling missing values, combining multiple encoders, and applying encoders to specific subsets of features.

Overall, Category Encoders is a useful tool for preprocessing categorical data and improving the accuracy and performance of machine learning models.

Cleaning labels: Cleanlab

Publié le 11 mars 2023 par loic

cleanlab automatically detects problems in a ML dataset. This data-centric AI package facilitates machine learning with messy, real-world data by providing clean labels for robust training and flagging errors in your data

Paper: https://arxiv.org/pdf/1911.00068.pdf

Code : Code

Yellowbrick: Machine Learning Visualization

Publié le 11 mars 2023 par loic

Feature Visualization

Rank Features: pairwise ranking of features to detect relationships
Parallel Coordinates: horizontal visualization of instances
Radial Visualization: separation of instances around a circular plot
PCA Projection: projection of instances based on principal components
Manifold Visualization: high dimensional visualization with manifold learning
Joint Plots: direct data visualization with feature selection

Classification Visualization

Class Prediction Error: shows error and support in classification
Classification Report: visual representation of precision, recall, and F1
ROC/AUC Curves: receiver operator characteristics and area under the curve
Precision-Recall Curves: precision vs recall for different probability thresholds
Confusion Matrices: visual description of class decision making
Discrimination Threshold: find a threshold that best separates binary classes

Regression Visualization

Prediction Error Plot: find model breakdowns along the domain of the target
Residuals Plot: show the difference in residuals of training and test data
Alpha Selection: show how the choice of alpha influences regularization
Cook’s Distance: show the influence of instances on linear regression

Clustering Visualization

K-Elbow Plot: select k using the elbow method and various metrics
Silhouette Plot: select k by visualizing silhouette coefficient values
Intercluster Distance Maps: show relative distance and size/importance of clusters

Model Selection Visualization

Validation Curve: tune a model with respect to a single hyperparameter
Learning Curve: show if a model might benefit from more data or less complexity
Feature Importances: rank features by importance or linear coefficients for a specific model
Recursive Feature Elimination: find the best subset of features based on importance

Target Visualization

Balanced Binning Reference: generate a histogram with vertical lines showing the recommended value point to bin the data into evenly distributed bins
Class Balance: see how the distribution of classes affects the model
Feature Correlation: display the correlation between features and dependent variables

Text Visualization

Term Frequency: visualize the frequency distribution of terms in the corpus
t-SNE Corpus Visualization: use stochastic neighbor embedding to project documents
Dispersion Plot: visualize how key terms are dispersed throughout a corpus
UMAP Corpus Visualization: plot similar documents closer together to discover clusters
PosTag Visualization: plot the counts of different parts-of-speech throughout a tagged corpus

AI Factory

Publié le 5 mars 2023 par loic

Text using Chatgpt, image from Dall-E, text to speech from D-ID

L	M	M	J	V	S	D
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Deeplearning.fr

You have to learn the rules of the game. And then you have to play better than anyone else

Archives mensuelles : mars 2023

Time Series Made Easy in Python: DARTS

Category Encoders

Cleaning labels: Cleanlab

Yellowbrick: Machine Learning Visualization

Feature Visualization

Classification Visualization

Regression Visualization

Clustering Visualization

Model Selection Visualization

Target Visualization

Text Visualization

AI Factory