My Content

Books Recommendation
Mobile App

Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Challenge

My Content is a start-up company that wants to help people read more by recommending books that they will love 😍.

To overcome this challenge, we need to :

  • represent books and users by meaningful features
  • understand users book preferences
  • recommend the right books to the right users
Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Goals

In this project, we are going to :

  • define the meaningful features of books and users
  • compare different recommendation systems
  • integrate the best recommendation system with a mobile application
Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Recommender Systems assumptions

  • similar users are likely to have similar content interrests
  • similar contents are likely to interest users similarly
  • user's content preferences may be :
    • explicit : based on actual feedback from users (reaction, rating, comment, ...)
    • implicit : based on the user's behavior (clicks, views, time spent, purchase, ...)
Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

State of the art

There are three main categories of recommender systems :

  • Content-Based Filtering : based solely on content features and the user's preferences
    • find content similar to what the user likes
  • Collaborative Filtering : based solely on the user's preferences and the user profiles
    • find content that similar users like
  • Hybrid recommender systems : combination of content and collaborative filtering
Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Exploratory data analysis (EDA)

For this MVP, we used the Globo.com dataset :

  • 364047 articles described by :
    • 3 metadata fields (category, creation time and word count)
    • 250 dimensions embedding of the article's content
  • 2988181 clicks (from Oct. 1st to Oct. 17th , 2017)
    • on 46033 news articles
    • by 322897 users
    • described by 10 fields (session, device and location)
Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

User preferences modeling

Without an explicit rating of the articles by the users, we must model an implicit rating : Rating(user,article)=#Clicks(user,article)#Clicks(user)Rating (user, article) = \frac{\#Clicks(user, article)}{\#Clicks(user)}

As for a user's preferences, we tested 3 models :

  • last clicked article
  • average of user's clicked articles over last session
  • average of all clicked articles
Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Models

  • Content-based Filtering : Cosine Similarity between user preferences and articles
  • Collaborative Filtering
  • Hybrid model
Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Protocol

To evaluate our models, we left out the last click of each user.

We train our models on the remaining clicks.
We then predict recommendations (ranking of all articles) for the next article the user will click.

Our model score is the average predicted rank of the actual article the user has clicked last.

Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Results

Type Library Model/Algo Mean Rank
(lower is better)
Predict time (s)
Content-based scikit-learn cosine similarity with mean of last click 264 17
Content-based scikit-learn cosine similarity with mean of last session 252 17
Content-based scikit-learn cosine similarity with mean of all clicks 216 17
Collaborative Surprise BaselineOnly 5972 1.68
Collaborative Surprise SVD 51874 1.85
Collaborative Implicit AlternatingLeastSquares 3.83 0.05
Hybrid LightFM LightFM 228 0.17
Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Curent MVP System Architecture

center-img

Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Demo

center-img

Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Target Production System Architecture

center-img

Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Next steps

  • Test the recommender system on real data
  • Fine-tune the model's hyper parameters
  • Implement the taret production system architecture
  • Improve the Mobile App (UX, security, ...)
Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

center-img

Clément Fleury Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License