AzureML Studio : Automated ML

In this notebook, we will use the AzureML Studio's AutoML to automatically select the best model given time and compute constraints.

The AutoML process is as follows:

The experiment is visible in the AzureML Studio : oc-p7-automated-ml

We will compare this pre-trained local model to the baseline model from 1_baseline.ipynb.

AutoML model : max 1h training on CPU

In this version, we did not include DNN models in the AutoML process, because they require GPU resources.

This AutoML run is available in the AzureML Studio : automl_1h-cpu

Here are the models that were trained in the AutoML process :

AzureML - AutomatedML - 1h on CPU - models

Best model

The best model is a XGBoostClassifier (wrapper for XGBClassifier) with MaxAbsScaler .

Best Model

Results

Confusion Matrix Precision Recall Curve (AP = 0.79) ROC Curve (AUC = 0.80)
Confusion Matrix Precision Recall Curve ROC Curve

The performances on the dataset are quite better than our baseline model :

Unlike our baseline model, this model is quite balanced, just slightly biased towards the POSITIVE class. It is much less biased than our baseline model : it predicted 9% (baseline = 35% , -74%) more POSITIVE (78403) messages than NEGATIVE (65597).

AutoML model : max 10h training on GPU

This AutoML run is available in the AzureML Studio : automl_10h-gpu

Here are the models that were trained in the AutoML process :

AzureML - AutomatedML - 10h on GPU - models

Best model

The best model is a LightGBM with MaxAbsScaler.

This model adds a pre-processing step, which integrates and fine-tunes a pre-trained BERT model, before training the actual classification model : PretrainedTextDNNTransformer.

Best Model

Results

Confusion Matrix Precision Recall Curve (AP = 0.942) ROC Curve (AUC = 0.942)
Confusion Matrix Precision Recall Curve ROC Curve

The performances on the dataset are quite better than our previous model :

Like our previous model, this model is very fair, just very slightly biased towards the NEGATIVE class this time. It is much less biased than our baseline model : it predicted only 1.4% (baseline = 35% , -96%) more NEGATIVE (128909) messages than POSITIVE (127091).