Fire up Prediction IO on Docker
PredictionIO is an open source machine learning server built upon the open source stack, for developers and data scientists to create predictive engines for any machine learning task. PredictionIO has a plethora of benefits, to name a few:
- Dynamic queries in realtime
- Ability to run multiple engine variants systematically
- Quick to deploy (supposedly)
- Simplified data infrastructure management
I could go on and on, but then again there’s a lot more we are yet to cover. A deep dive of the benefits can be found on predictionIO.
Docker…,
Docker is a tool designed to benefit both developers and system administrators, making it part of many DevOps(developer + Operation) tools. Some benefits of this necessity tool include:
- Portability
- Faster software delivery cycles
- Efficient use of system resources
- Docker shines for micro-services architecture
- blah blah blah 🙄
And that’s why we are going to run predictionIO on Docker.
Now, let’s cut to the chase.
First off, we need to clone PredictionIO from the Japan PredictionIO User Group.
$ git clone https://github.com/jpioug/predictionio-docker.git predictionIO
It will probably take a second or two, depending on your internet speed.
Just before we change our directory, make sure you have installed Docker and docker-compose, then cd predictionIO
. If you run PredictionIO with PostgreSQL, run as below:
$ docker-compose -f docker-compose.yml \
-f pgsql/docker-compose.base.yml \
-f pgsql/docker-compose.meta.yml \
-f pgsql/docker-compose.event.yml \
-f pgsql/docker-compose.model.yml \
up -d
Let’s confirm that the images are up and running using $ docker ps
Setting up the recommendation engine
We are going to add a recommendation engine aka a recommendation template to predictionIO
Let’s get inside the docker bash so that we can install some of the packages necessary to run the template
$ docker exec -it container_name /bin/bash
Basically, we are telling Docker that we want to get into the bash with the ability to interact with some shell commands.
$ git clone https://github.com/apache/predictionio-template-recommender.git recommender && cd recommender
With this command, you’ll be able to clone a recommendation template to a recommender directory, and change the directory to the latter directory when it is complete. Normally, the directory name is exactly the same or almost the same as the template name, to make it easier to remember what the template is doing after a period of time. Duh🙄
Let’s initialise our pio application:-
$ pio app new <your_app_name>
NOTE: make sure you are inside the recommender template
PredictionIO can run more than one template and more than one app, so if you have more than one application you can run $ pio app list
to list all the apps you have. (Take note of the access key generated too. We will use it in a few).
Finally we are about to get our hands really dirty. You’ll need a napkin, coffee and someone to comfort you.
Just kidding …
Anyway, we’ll be using datasets from sample_movielens_data , but before we get to that let’s install pip and predictionIO.
pip is a package that contains all the files you need for a module. We will use it to install all necessary packages.
Use $ apt update
to update then install $ apt install python3-pip
Once the installation is complete, verify the installation by checking the pip version:
$ pip3 --version
The version number may vary, but it will look something like this:
outputpip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)
Then we install PredictionIO using:
$ pip3 install predictionio
Time to get the datasets from sample_movielens_data using the following command
$ curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt
Remember the access key I told you to take note of earlier? We will assign it to our recommender
$ python3 data/import_eventserver.py --access_key <ACCESS_KEY>
Pheeeew!!!
It’s been the longest installation ever, I wish it could be shorter but due to lack of updated documentation we have to go through all this…
Note: We need to install scala
$ apt-get install scala
Make sure you are in the recommender folder when you run the following command: -
$ pio build --verbose
If you run into the error below, don’t panic!
Just run the command below, and you’ll be good to go.
$ cd /root/.sbt/launchers/1.2.8/$ wget https://repo.scala-sbt.org/scalasbt/maven-releases/org/scala-sbt/sbt-launch/1.2.8/sbt-launch.jar
Go back to the recommender directory then rerun $ pio build --verbose
everything should be working as expected.
Since we are using the recommender template, you will be required to add your app name. Run $ pio app list
then get the app name.
Open the recommender directory. You can open engine.json using vim
{
"id": "default",
"description": "Default settings",
"engineFactory": "org.example.recommendation.RecommendationEngine",
"datasource": {
"params" : {
"appName": "INVALID_APP_NAME" <----- add your app name here
}
},
And then run the command below to train:-
$ pio train
After a successful training, let’s try deploying the engine and see whether it is working correctly.
$ pio deploy --port 8000
Open the following link localhost:8000
and …
BOOOOOM!!!!!!!!!
Now you can integrate it into your current application, or (a more fun option) read my next blog on how to integrate PredictionIO with python …
See you next time … Yes?
Your support is worth a 1000 followers