Avis Restau : improve the AI product of your start-up

Context

"Avis Restau" is a start-up who's goal is to connect restaurants and customers. Customers will be able to post photos and reviews of the restaurants they have visited.

The goal here is to identify topics of bad customer reviews and to label photos as indoor or outdoor , food or drink, ...

Load project modules

The helpers functions and project specific code will be placed in ../src/.

We will use the Python programming language, and present here the code and results in this Notebook JupyterLab file.

We will use the usual libraries for data exploration, modeling and visualisation :

We will also use libraries specific to the goals of this project :

Academic dataset

We will use also use the Academic dataset provided by Yelp (https://www.yelp.com/dataset) composed of 8,635,403 reviews, 160,585 businesses and 200,000 pictures from 8 metropolitan areas.

We are only going to use the reviews and photos data. Since the dataset is huge, we are going to sample a small subset of the data.

Load the dataset from JSON

Exploratory Data Analysis

We will just display here a few statistics about the DataFrame.

Labels are uniformly distributed among the 5 classes : "drink", "food", "indoor", "outdoor" and "menu".

Computer vision

In this section, we are going to try and predict the label of the photos.

To achieve this, we need to :

Visual features extraction

We are going to extract the following features :

We have gathered the three visual features we need to describe as precisely as possible the photos.

Now some features can be really close to each other, and we want to group them together.

Color features

Here, we are going to train a clustering model to group the different color features together. First, we need to create a dataset of color features.

Now we need to prepare the color features for clustering, using standard scaling.

Before training our clustering model, we need to define the hyperparameter : the number of clusters we want to find. For that, we will use the elbow method on the inertia of the clusters.

We can see that even with a large number of clusters, the inertia will always be high. This means that the data is very sparse, and the clusters are not very dense. Anyway, a good number of clusters is around 100.

We can now train our clustering model and count the number of occurrences of each color cluster for each photo. This is the "Bag of visual-words" representation of the photo.

HOG Features

Here, we are going to train a clustering model to group the different Histogram of Gradients (HOG) features together. First, we need to create a dataset of HOG features.

Now we need to prepare the HOG features for clustering, using standard scaling.

Before training our clustering model, we need to define the hyperparameter : the number of clusters we want to find. For that, we will use the elbow method on the inertia of the clusters.

We can see that even with a large number of clusters, the inertia will always be high. This means that the data is very sparse, and the clusters are not very dense. Anyway, a good number of clusters is around 25.

We can now train our clustering model and count the number of occurrences of each HOG cluster for each photo. This is the "Bag of visual-words" representation of the photo.

ORB features

Here, we are going to train a clustering model to group the different keypoints (detected by the ORB algorithm) features together. First, we need to create a dataset of ORB features.

Now we need to prepare the ORB features for clustering, using standard scaling.

Before training our clustering model, we need to define the hyperparameter : the number of clusters we want to find. For that, we will use the elbow method on the inertia of the clusters.

We can see that even with a large number of clusters, the inertia will always be high. This means that the data is very sparse, and the clusters are not very dense. Anyway, a good number of clusters is around 50.

We can now train our clustering model and count the number of occurrences of each ORB cluster for each photo. This is the "Bag of visual-words" representation of the photo.

Our BOvW representation is now ready to be used for the classification task.

Image classification

We are going to train a classifier to predict the label of the photos.

Data preparation

First, we need to prepare the data for the classification task :

Training and test sets

We split the dataset into training and test sets, keeping the labels distribution.

Scaling

Now, we scale the data to have a mean of 0 and a standard deviation of 1, so that each cluster has the same weight.

Dimensionality reduction (PCA)

We want to reduce the dimensionality of the data, so that we can train a classifier with a smaller number of features, while keeping as much informtion as possible. For that, we use the elbow method to find the optimal number of components of the PCA.

With only 25 components, we are able to greatly reduce the dimensionality of the data, while keeping half of the information.

Classification

We can now train our classifier and evaluate its performance.

Training

We train the classifier on the training set and make predictions on the train and test set.

Evaluation

We evaluate the classifier performance on the train and test set.

We can see that the classifier is actually not too bad on the train set (F1 score ~ 0.73), but performs a bit worse on the test set (F1 score ~ 0.63).

We can see that our model performs very well on "menu" photos (especially on the test set), as well as "food" photos (especially on the train set). But it performs poorly on "drink" and "interior" and "outside" photos.

Visualization

We can now visualize the prediction of the classifier on a random photo, with its nearest neighbor.

Image classification with a CNN

We are going to use a pre-trained CNN to label the photos.

Basic photo labelling

First, we are going to test the photo labelling feature of the pre-trained CNN.

We are going to use VGG16 from Keras to extract the features of the photos and predict the best labels among the 1000 classes the model has learned.

We can see that even without any work from our part, the model is very good at describing what the photo represents.

Now we want to use this model to predict our classes ("food", "drink, "menu", "interior" and "outside").

Transfer learning

We want to reuse the pre-trained CNN to learn and predict the labels of our own photos. For that, we are going to implement the "feature extraction" part of the transfer learning.

Model definition

First, we need to define the model. We will use VGG16 as a base model, and we will add our own fully connected layers for the prediction of our labels.

Model evaluation

We can now fit and evaluate our model on the dataset.

This model is much more powerful than the previous one. The accuracy is now at 0.9, and the F1 score is at 0.9.

We can see that the classification is excellent! It has been able to predict correctly almost all the classes of our dataset.

Visualization

At last, we can visualize the prediction of the model on a random photo.