Twitter 2021 RecSys Challenge
Implementation of a Recommender System that predicts interactions for Twitter users. Reached 14th place overall and 8th position in the Like leaderboard.
Final project for the MSc in Fundamental Principles of Data Science at the Universitat de Barcelona — a participation in the RecSys Challenge 2021, organised by Politecnico di Bari, ETH Zurich, and Jönköping University, with data provided by Twitter.
The Challenge
The task was to predict the probability that a Twitter user engages with a tweet in their home timeline — covering four engagement types: Like, Retweet, Reply, and Quote. The dataset consisted of roughly 1 billion user–tweet interaction pairs.
Approach
The solution is based on Gradient Boosting Trees with hand-crafted features representing user interaction history, tweet content characteristics, and user relationship signals. Three model variants were developed:
- Text-Based Model — features derived from tweet text
- Non-Text-Based Model — features from metadata and interaction history
- Mixed Model — a combination of both, which achieved the best performance
Results
| Metric | Position |
|---|---|
| Overall leaderboard | 14th place |
| Like category | 7th – 9th place |
Tech Stack
| Layer | Technology |
|---|---|
| Language | Python |
| Models | Gradient Boosting Trees (XGBoost / LightGBM) |
| Notebooks | Jupyter |
| Feature engineering | Custom text + non-text features |
Contributors
Developed with Marcos Moreno Blanco.