Build a Recommendation Engine with PredictionIO : Heroku

  1. Introduction
  2. Prerequisites
  3. Source Code
  4. Step 1 Clone the Source code
  5. Step 2 Create a Heroku App
  6. Step 3 Deploy Event Server to Heroku
  7. Step 4 Create a new app
  8. Step 5 Populate Event Server with Events
  9. Step 6 Deploy Recommendation Engine
  10. Step 7 Configure the Heroku app
  11. Step 8 Increase Heap size for Java VM
  12. Step 9 Train the Engine
  13. Step 10 Predict

Introduction

In this workshop you will learn how to use PredictionIO Machine Learning library to build a recommendation engine based on Alternative Least Square Algorithm. PredictionIO uses Spark MLlib’s implementation and provide convenient APIs and REST endpoints to get the infrastructure up and running fast.

Prerequisites

  • Heroku Account
  • Heroku CLI (git is part of Heroku CLI)
  • curl
  • git-bash

Heroku account with Credit Card is required for two Dynos to run simultaneously

Source Code

Source code of this workshop resides in two repos listed below

pio-eventserver-heroku : Provdes storage for events being generated based on which we want to create our training model.

pio-engine-heroku : Engine which wraps the ALS Algorithm implementation and provides APIs to create a model, train it and use it to make prediction.

We will be using PostgreSQL database for this workshop.

Step 1 Clone the Source code

Clone the source code

$ git clone https://github.com/rajdeepd/pio-eventserver-heroku

$ git clone https://github.com/rajdeepd/pio-engine-heroku

Step 2 Create a Heroku App

Note: You need to change rd-pio-eventserver-1 to your own unique app name

$ cd pio-eventserver-heroku
$ heroku create rd-pio-eventserver-1

Output

Creating rd-pio-eventserver-1... done, stack is cedar-14
https://rd-pio-eventserver-1.herokuapp.com/ | 
https://git.heroku.com/rd-pio-eventserver-1.git
Git remote heroku added

Check git remote

$ git remote -v
heroku  https://git.heroku.com/rd-pio-eventserver-1.git (fetch)
heroku  https://git.heroku.com/rd-pio-eventserver-1.git (push)
origin  https://github.com/rajdeepd/pio-eventserver-heroku (fetch)
origin  https://github.com/rajdeepd/pio-eventserver-heroku (push)

Step 3 Deploy Event Server to Heroku

$ git push heroku master

Output

remote:        [info] Done updating.
remote:        [info] Compiling 1 Scala source to /tmp/scala_...
remote:        [success] Total time: 42 s, completed Aug 25, 2016 9:59:56 AM
remote:        [info] Wrote scala-2.10/pio-eventserver-heroku_2.10-0.1-SNAPSHOT.pom
remote:        [info] Packaging pio-eventserver-heroku_2.10-0.1-SNAPSHOT.jar ...
remote:        [info] Done packaging.
remote:        [success] Total time: 2 s, completed Aug 25, 2016 9:59:58 AM
remote: -----> Dropping ivy cache from the slug
remote: -----> Dropping sbt boot dir from the slug
remote: -----> Dropping compilation artifacts from the slug
remote: -----> Discovering process types
remote:        Procfile declares types -> console, web
remote: 
remote: -----> Compressing...
remote:        Done: 183.1M
remote: -----> Launching...
remote:        Released v4
remote:        https://rd-pio-eventserver-1.herokuapp.com/ deployed to Heroku
remote: 
remote: Verifying deploy.... done.
To https://git.heroku.com/rd-pio-eventserver-1.git
 * [new branch]      master -> master

Check the DATABASE_URL

$ heroku config
=== rd-pio-eventserver-1 Config Vars
DATABASE_URL: postgres://rdatjvbvdwqvyq:nNL9b1cnjoQt8hCcQumEMahrmL@ec2-54-243-208-195.compute-1.amazonaws.com:5432/d8spomhdp00n03

Step 4 Create a new app

PredictionIO tracks events, ML engine based on App ID. We will create a new app and tie events to this ID as well the ML engine which will be trained later

$ heroku run console app new MyApp1
Running `console app new MyApp1` attached to terminal... up, run.5174
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [App$] Created new app:
[INFO] [App$]       Name: MyApp1
[INFO] [App$]         ID: 1
[INFO] [App$] Access Key: 2Evbo5hiUiXXXCu_uB-gK1Q3EiT2N8nGd1-AGY5hjrsQ3PonJCdwP1YZ5WN5519O

Set Environment variable

Linux, Mac OS X

$ export ACCESS_KEY=2Evbo5hiUiXXXCu_uB-gK1Q3EiT2N8nGd1-AGY5hjrsQ3PonJCdwP1YZ5WN5519O
Windows
$ set ACCESS_KEY=2Evbo5hiUiXXXCu_uB-gK1Q3EiT2N8nGd1-AGY5hjrsQ3PonJCdwP1YZ5WN5519O

Step 5 Populate Event Server with Events

Please change heroku app name from CHANGEME to the actual value you gave earier in the URL for all the commands listed below.

Linux, Mac OS X

for i in {1..5}; do curl -i -X POST http://CHANGEME.herokuapp.com/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"user\", \"entityId\" : \"u$i\" }"; done

for i in {1..50}; do curl -i -X POST http://CHANGEME.herokuapp.com/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"item\", \"entityId\" : \"i$i\", \"properties\" : { \"categories\" : [\"c1\", \"c2\"] } }"; done

for i in {1..5}; do curl -i -X POST http://CHANGEME.herokuapp.com/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d "{ \"event\" : \"view\", \"entityType\" : \"user\", \"entityId\" : \"u$i\",  \"targetEntityType\" : \"item\", \"targetEntityId\" : \"i$(( ( RANDOM % 50 )  + 1 ))\" }"; done

Windows

for /L %a IN (1,1,5) DO (
  curl -i -X POST http://rd-pio-eventserver-t1.herokuapp.com/events.json?accessKey=%ACCESS_KEY% -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"user\", \"entityId\" : \"u%a\" }"
)


for /L %a IN (1,1,50) DO (
  curl -i -X POST http://rd-pio-eventserver-t1.herokuapp.com/events.json?accessKey=%ACCESS_KEY% -H "Content-Type: application/json" -d "{ \"event\" : \"\$set\", \"entityType\" : \"item\", \"entityId\" : \"i%a\", \"properties\" : { \"categories\" : [\"c1\", \"c2\"] } }"
)

for /L %a IN (1,1,5) DO (
     curl -i -X POST http://rd-pio-eventserver-t1.herokuapp.com/events.json?accessKey=%ACCESS_KEY% -H "Content-Type: application/json" -d "{ \"event\" : \"view\", \"entityType\" : \"user\", \"entityId\" : \"u1\",  \"targetEntityType\" : \"item\", \"targetEntityId\" : \"i%a\" }"
)

Step 5.1 Check the Events Inserted in a Browser

http://rd-pio-eventserver-1.herokuapp.com/events.json?accessKey=2Evbo5hiUiXXXCu_uB-gK1Q3EiT2N8nGd1-AGY5hjrsQ3PonJCdwP1YZ5WN5519O&limit=100

Step 6 Deploy Recommendation Engine

Note: You need to change rd-pio-engine-1 to your own unique app name

$ cd pio-engine-heroku
$ heroku create rd-pio-engine-1
$ git push heroku master

Step 6.1 : Remove existing AddOn

$ heroku addons
=== Resources for rd-pio-engine-1
Plan                         Name                     Price
---------------------------  -----------------------  -----
heroku-postgresql:hobby-dev  postgresql-pointy-19292  free

$ heroku addons:remove postgresql-pointy-19292

Output


WARNING: `heroku addons:remove` has been deprecated. Please use `heroku addons:destroy` instead.

 !    WARNING: Destructive Action
 !    This command will affect the app: rd-pio-engine-1
 !    To proceed, type "rd-pio-engine-1" or re-run this command with --confirm rd-pio-engine-1

> rd-pio-engine-1
Destroying postgresql-pointy-19292 on rd-pio-engine-1... done, (free)
Removing vars for DATABASE from rd-pio-engine-1 and restarting... done, v5

Step 6.2 Configure DATABASE_URL to point to Event Server DB

$ heroku config:set DATABASE_URL=postgres://rdatjvbvdwqvyq:nNL9b1cnjoQt8hCcQumEMahrmL@ec2-54-243-208-195.compute-1.amazonaws.com:5432/d8spomhdp00n03

Step 7 Configure the Heroku app

heroku config:set ACCESS_KEY=<YOUR APP ACCESS KEY> APP_NAME=<APP NAME> EVENT_SERVER_IP=<YOUR EVENT SERVER HOSTNAME> EVENT_SERVER_PORT=80

Example

heroku config:set ACCESS_KEY=2Evbo5hiUiXXXCu_uB-gK1Q3EiT2N8nGd1-AGY5hjrsQ3PonJCdwP1YZ5WN5519O APP_NAME=MyApp1 \
  EVENT_SERVER_IP=rd-pio-eventserver-1.herokuapp.com \
  EVENT_SERVER_PORT=80

Output

Setting config vars and restarting rd-pio-engine-1... done, v6
ACCESS_KEY:        2Evbo5hiUiXXXCu_uB-gK1Q3EiT2N8nGd1-AGY5hjrsQ3PonJCdwP1YZ5WN5519O
APP_NAME:          MyApp1
EVENT_SERVER_IP:   rd-pio-eventserver-1.herokuapp.com
EVENT_SERVER_PORT: 80

Step 8 Increase Heap size for Java VM

$ heroku config:set JAVA_OPTS="-Xmx512m"

Step 9 Train the Engine

In this step we will train the Recommendation Engine based on the Events inserted above. Code listed below is the core training method called inside PredictionIO Servier

val m = ALS.trainImplicit(
      ratings = mllibRatings,
      rank = ap.rank,
      iterations = ap.numIterations,
      lambda = ap.lambda,
      blocks = -1,
      alpha = 1.0,
      seed = seed)

    new ALSModel(
      productFeatures = m.productFeatures.collectAsMap.toMap,
      itemStringIntMap = itemStringIntMap,
      items = items
    )

**Optional Step**

If you are running free dynos makes sure you scale down the web dynos before training

$ heroku ps:scale web=0 train=0

Now run the training command

$ heroku run train

Output


[INFO] [Engine$] ALSModel does not support data sanity check. Skipping check.
[INFO] [Engine$] EngineWorkflow.train completed
[INFO] [Engine] engineInstanceId=efb28115-5007-4356-a45c-cab9b7b1da6f
[INFO] [CoreWorkflow$] Inserting persistent model
[INFO] [CoreWorkflow$] Updating engine instance
[INFO] [CoreWorkflow$] Training completed successfully.
[INFO] [ServerConnector] Stopped ServerConnector@18578491{HTTP/1.1}{0.0.0.0:4040}

Bring back the Web dyno (for setups using free dynos)

heroku ps:scale web=1 train=0

Check the Recommendation Engine running in the browser

Step 10 Predict

Items similar to i3

Linux, Mac OSX

$ curl -H "Content-Type: application/json" -d '{ "items": ["i3"], "num": 4 }' \
   -k http://rd-pio-engine-1.herokuapp.com/queries.json

Windows

curl -H "Content-Type: application/json" ^
    -d "{\"items\": [\"i3\"], \"num\": 4 }" ^
    -k http://rd-pio-engine-1.herokuapp.com/queries.json

Response will be similar to the listing below

{"itemScores":
  [
    {
      "item":"i44","score":0.2805472425881496
    },
    
    {
      "item":"i41","score":0.14458527026450552
    }
  ]
}