Fire up Prediction IO on Docker

4 min readNov 24, 2020

PredictionIO is an open source machine learning server built upon the open source stack, for developers and data scientists to create predictive engines for any machine learning task. PredictionIO has a plethora of benefits, to name a few:

Dynamic queries in realtime
Ability to run multiple engine variants systematically
Quick to deploy (supposedly)
Simplified data infrastructure management

I could go on and on, but then again there’s a lot more we are yet to cover. A deep dive of the benefits can be found on predictionIO.

Docker…,

Docker is a tool designed to benefit both developers and system administrators, making it part of many DevOps(developer + Operation) tools. Some benefits of this necessity tool include:

Portability
Faster software delivery cycles
Efficient use of system resources
Docker shines for micro-services architecture
blah blah blah 🙄

And that’s why we are going to run predictionIO on Docker.

Now, let’s cut to the chase.

First off, we need to clone PredictionIO from the Japan PredictionIO User Group.

$ git clone https://github.com/jpioug/predictionio-docker.git predictionIO

It will probably take a second or two, depending on your internet speed.

Just before we change our directory, make sure you have installed Docker and docker-compose, then cd predictionIO. If you run PredictionIO with PostgreSQL, run as below:

$ docker-compose -f docker-compose.yml \
  -f pgsql/docker-compose.base.yml \
  -f pgsql/docker-compose.meta.yml \
  -f pgsql/docker-compose.event.yml \
  -f pgsql/docker-compose.model.yml \
  up -d

Let’s confirm that the images are up and running using $ docker ps

Setting up the recommendation engine

We are going to add a recommendation engine aka a recommendation template to predictionIO

Let’s get inside the docker bash so that we can install some of the packages necessary to run the template

$ docker exec -it container_name /bin/bash

Basically, we are telling Docker that we want to get into the bash with the ability to interact with some shell commands.

$ git clone https://github.com/apache/predictionio-template-recommender.git recommender && cd recommender

With this command, you’ll be able to clone a recommendation template to a recommender directory, and change the directory to the latter directory when it is complete. Normally, the directory name is exactly the same or almost the same as the template name, to make it easier to remember what the template is doing after a period of time. Duh🙄

Let’s initialise our pio application:-

$ pio app new <your_app_name>

NOTE: make sure you are inside the recommender template

PredictionIO can run more than one template and more than one app, so if you have more than one application you can run $ pio app list to list all the apps you have. (Take note of the access key generated too. We will use it in a few).

Finally we are about to get our hands really dirty. You’ll need a napkin, coffee and someone to comfort you.

Just kidding …

Anyway, we’ll be using datasets from sample_movielens_data , but before we get to that let’s install pip and predictionIO.

pip is a package that contains all the files you need for a module. We will use it to install all necessary packages.

Use $ apt update to update then install $ apt install python3-pip

Once the installation is complete, verify the installation by checking the pip version:

$ pip3 --version

The version number may vary, but it will look something like this:

outputpip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

Then we install PredictionIO using:

$ pip3 install predictionio

Time to get the datasets from sample_movielens_data using the following command

$ curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt

Remember the access key I told you to take note of earlier? We will assign it to our recommender

$ python3 data/import_eventserver.py --access_key <ACCESS_KEY>

Pheeeew!!!

It’s been the longest installation ever, I wish it could be shorter but due to lack of updated documentation we have to go through all this…

Note: We need to install scala

$ apt-get install scala

Make sure you are in the recommender folder when you run the following command: -

$ pio build --verbose

If you run into the error below, don’t panic!

Just run the command below, and you’ll be good to go.

$ cd /root/.sbt/launchers/1.2.8/$ wget https://repo.scala-sbt.org/scalasbt/maven-releases/org/scala-sbt/sbt-launch/1.2.8/sbt-launch.jar

Go back to the recommender directory then rerun $ pio build --verbose everything should be working as expected.

Since we are using the recommender template, you will be required to add your app name. Run $ pio app list then get the app name.

Open the recommender directory. You can open engine.json using vim

{
  "id": "default",
  "description": "Default settings",
  "engineFactory": "org.example.recommendation.RecommendationEngine",
  "datasource": {
    "params" : {
      "appName": "INVALID_APP_NAME" <----- add your app name here
    }
  },

And then run the command below to train:-