Microsoft / Recommenders
- суббота, 19 января 2019 г. в 00:15:53
Jupyter Notebook
Recommender Systems
This repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on four key tasks:
Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting train/test data. Implementations of several state-of-the-art algorithms are provided for self-study and customization in your own applications.
Please see the setup guide for more details on setting up your machine locally, on Spark, or on Azure Databricks.
To setup on your local machine:
git clone https://github.com/Microsoft/Recommenders
cd Recommenders
./scripts/generate_conda_file.sh
conda env create -n reco -f conda_bare.yaml
conda activate reco
python -m ipykernel install --user --name reco --display-name "Python (reco)"
cd notebooks
jupyter notebook
We provide several notebooks to show how recommendation algorithms can be designed, evaluated and operationalized.
The Quick-Start Notebooks detail how you can quickly get up and run with state-of-the-art algorithms such as the Smart Adaptive Recommendation (SAR) algorithm and ALS algorithm.
The Data Preparation Notebook shows how to prepare and split data properly for recommendation systems.
The Modeling Notebooks provide a deep dive into implementations of different recommender algorithms.
The Evaluation Notebooks show how to evaluate recommender algorithms for different ranking and rating metrics.
The Operationalizion Notebook demonstrates how to deploy models in production systems.
In addition, we also provide a comparison notebook to illustrate how different algorithms could be evaluated and compared. In this notebook, data (MovieLens 1M) is randomly split into train/test sets at a 75/25 ratio. A recommendation model is trained using each of the collaborative filtering algorithms below. We utilize empirical parameter values reported in literature here. For ranking metrics we use k = 10 (top 10 results). We run the comparison on a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 K80 GPU). Spark ALS is run in local standalone mode.
Preliminary Comparison
Algo | MAP | nDCG@k | Precision@k | Recall@k | RMSE | MAE | R2 | Explained Variance |
---|---|---|---|---|---|---|---|---|
ALS | 0.002020 | 0.024313 | 0.030677 | 0.009649 | 0.860502 | 0.680608 | 0.406014 | 0.411603 |
SAR | 0.064013 | 0.308012 | 0.277215 | 0.109292 | N/A | N/A | N/A | N/A |
SVD | 0.010915 | 0.102398 | 0.092996 | 0.025362 | 0.888991 | 0.696781 | 0.364178 | 0.364178 |
This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.
Build Type | Branch | Status | Branch | Status | |
---|---|---|---|---|---|
Linux CPU | master | staging | |||
Linux Spark | master | staging |
NOTE - the tests are executed every night, we use pytest
for testing python utilities in reco_utils and notebooks.