A/B Analysis for a Recommendation Model

Estimated completion time: 14 min.


In this tutorial, you will learn how to retrospectively compare the behavior of two different models.

By the end of this tutorial you will know how to:

  • Set up an A/B application

  • Analyze production data


Set Up an A/B Application

Prepare a model for uploading


Install the dependencies in your local environment.

pip install -r requirements.txt
import sys
import joblib
from lightfm import LightFM
from lightfm.datasets import fetch_movielens
if __name__ == "__main__":
no_components = int(sys.argv[1])
print(f"Number of components is set to {no_components}")
# Load the MovieLens 100k dataset. Only five
# star ratings are treated as positive.
data = fetch_movielens(min_rating=5.0)
# Instantiate and train the model
model = LightFM(no_components=no_components, loss='warp')
model.fit(data['train'], epochs=30, num_threads=2)
# Save the model
joblib.dump(model, "model.joblib")
import joblib
import numpy as np
from lightfm import LightFM
# Load model once
model: LightFM = joblib.load("/model/files/model.joblib")
# Get all item ids
item_ids = np.arange(0, 1682)
def get_top_rank_item(user_id):
# Calculate scores per item id
y = model.predict(user_ids=[user_id], item_ids=item_ids)
# Pick top 3
top_3 = y.argsort()[:-4:-1]
# Return {'top_1': ..., 'top_2': ..., 'top_3': ...}
return dict([(f"top_{i + 1}", item_id) for i, item_id in enumerate(top_3)])
kind: Model
name: movie_rec
runtime: hydrosphere/serving-runtime-python-3.7:3.0.0
install-command: sudo apt install --yes gcc && pip install -r requirements.txt
- src/
- requirements.txt
- model.joblib
name: get_top_rank_item
shape: scalar
type: int64
profile: numerical
shape: scalar
type: int64
profile: numerical
shape: scalar
type: int64
profile: numerical
shape: scalar
type: int64
profile: numerical

Upload Model A

We train and upload our model with 5 components as movie_rec:v1

python train_model.py 5
hs apply -f serving.yaml

Upload Model B

Next, we train and upload a new version of our original model with 20 components as movie_rec:v2

python train_model.py 20
hs apply -f serving.yaml

We can check that we have multiple versions of our model by running:

hs model list

Create an Application

To create an A/B deployment we need to create an Application with a single execution stage consisting of two model variants. These model variants are our Model A and Model B correspondingly.

The following code will create such an application:

from hydrosdk import ModelVersion, Cluster
from hydrosdk.application import ApplicationBuilder, ExecutionStageBuilder
cluster = Cluster('http://localhost')
model_a = ModelVersion.find(cluster, "movie_rec", 1)
model_b = ModelVersion.find(cluster, "movie_rec", 2)
stage_builder = ExecutionStageBuilder()
stage = stage_builder.with_model_variant(model_version=model_a, weight=50). \
with_model_variant(model_version=model_b, weight=50). \
app = ApplicationBuilder("movie-ab-app").with_stage(stage).build(cluster)

Invoking movie-ab-app

We'll simulate production data flow by repeatedly asking our model for recommendations.

import numpy as np
from hydrosdk import Cluster, Application
from tqdm.auto import tqdm
cluster = Cluster("http://localhost", grpc_address="localhost:9090")
app = Application.find(cluster, "movie-ab-app")
predictor = app.predictor()
user_ids = np.arange(0, 943)
for uid in tqdm(np.random.choice(user_ids, 2000, replace=True)):
result = predictor.predict({"user_id": uid})