This project walks through how you can create recommendations using Apache Spark machine learning. There are a number of jupyter notebooks that you can run on IBM Data Science Experience, and there a live demo of a movie recommendation web application you can interact with.
If you want to try out a live demo of the web application, visit here. This is also an overview video on YouTube.
This project is a demo movie recommender application. This demo has been installed with approximately four thousand movies and one million ratings from the MovieLens 1M Dataset. The purpose of this web application is to allow users to search for movies, rate movies, and receive recommendations for movies based on their ratings.
Start with [Step 00 - Project Overview](./Step 00 - Project Overview.ipynb) to read more about this project.
You can import these notebooks into IBM Data Science Experience. I have occasionally experienced issues when trying to load from a URL. If that happens to you, try cloning or downloading this repo and importing the notebooks as files.
The technologies used in this demo are:
- Python flask application
- IBM Bluemix for hosting the web application and services
- IBM Cloudant NoSQL for storing movies, ratings, user accounts and recommendations
- IBM Compose Redis for maintaining an Atomic Increment counter for ID fields for user accounts
- IBM Datascience Experience (DSX) and Spark as a Service for:
- exploring data and analysing ratings
- training and testing a recommendation model
- retraining recommendation model hourly
- generating recommendations and saving to Cloudant
- IBM Datascience Experience (DSX) Github integration for saving notebooks
The overall architecture looks like this:
The screenshot below shows some movies being rated by a user.
The screenshot below shows movie recommendations provided by Spark machine learning.
Click on this link, then follow the instructions. Note that this step may take quite a long time (maybe 30 minutes).
An instance of Cloudant, Compose Redis and the Flask web application will be set up for you.
After deploying to Bluemix, you will need to create a new DSX project and import the notebooks. The notebook [Step 07 - Cloudant Datastore Recommender.ipynb](./Step 07 - Cloudant Datastore Recommender.ipynb) is responsible for creating recommendations and saving them to Cloudant. You will not get recommendations until you have setup this notebook with your Cloudant credentials and run the notebook from DSX.
See the instructions here