My Content : Books recommandation mobile app

Context

"My Content" is a start-up who's goal is to encourage people to read by recommending relevant content to users.

In this project, we want to create a mobile app that will be recommend relevant articles to users based on their implicit preferences, their profiles and the articles content. This is known as a Recommender System and is a very common challenge in any content-based website (blog, news, audio/video, ...) or service (social network, marketplace, streaming platform, dating platform, ....).

We will compare different models (Content-Based Filtering, Collaborative Filtering, Matrix Factorization, ...) on the Globo.com dataset. Then, we will integrate one model in a mobile app that will be able to recommend relevant articles to users. Finally, we will use Azure Machine Learning and Azure Functions to store the recommendations in Azure CosmosDB and to make the recommendations available to the users.

State of the art

The goal of Recommender Systems is to suggest relevant content to users, given :

There are three main categories of recommender systems :

Project modules

We will use the Python programming language, and present here the code and results in this Notebook JupyterLab file.

We will use the usual libraries for data exploration, modeling and visualisation :

We will also use libraries specific to the goals of this project :

Data

Let's download the data from the Globo.com dataset and look at what it contains.

The dataset is composed of the following files :

Data profile reports :

Protocol and Findings

To evaluate our models, we left out the last click of each user. We train our models on the remaining clicks. We then predict recommendations (ranking of all articles) for the next article the user will click. Our model score is the average predicted rank of the actual article the user has clicked last.

Type Library Model/Algo Mean Rank (lower is better) Predict time (s)
Content-based scikit-learn cosine similarity with mean of last click 264 17
Content-based scikit-learn cosine similarity with mean of last session 252 17
Content-based scikit-learn cosine similarity with mean of all clicks 216 17
Collaborative Surprise BaselineOnly 5972 1.68
Collaborative Surprise SVD 51874 1.85
Collaborative Implicit AlternatingLeastSquares 3.83 0.05
Hybrid LightFM LightFM 228 0.17

System architecture

Current MVP

Current MVP architecture

Target in production

Target production architecture

Articles metadata

Article clicks

Articles embeddings