Using MLflow with LineaPy

Published by Ming-Jer Lee on November 14, 2022

MLflow is a great tool for managing the entire ML lifecycle. We often use MLflow for the following purposes:

Tracking ML experiments to record and compare parameters and results
Managing and deploying models from many ML libraries to various model serving and inference platforms
Utilizing the centralized model store to manage the full lifecycle of an ML model

Compared to MLflow, which focuses on ML models, LineaPy treats all serializable Python objects as artifacts. Here’s how:

It can slice out only relevant code of given artifacts from messy code
It can perform parameter refactoring to modularize the workflow of artifacts
It can create a pipeline to a variety of job orchestrators
It includes a general-purpose artifact store to manage the lifecycle of artifacts

LineaPy and MLflow intersect when an artifact is also an ML model. In this case, it would be great if we could leverage the functionality from both libraries.

Currently, in order to achieve the above goal, we need to save the artifact (model) in both MLflow and LineaPy. However, manually maintaining two copies of the same data in two locations is usually not a good practice. Thus, we were trying to find an easy way for existing MLflow users to leverage the functionalities of LineaPy or vice versa.

Current Behavior

Both LineaPy and MLflow require users to save the artifact (ML model) to the artifact store to utilize the full feature of the library. In LineaPy, it is through lineapy.save , and in MLflow, it is through mlflow.flavor.log_model or its equivalent.

Manually writing two save statements for the same object can create a lot of problems down the line. For instance, version tracking for two stores would become a nightmare if one of the two statements is missed accidentally. It would be great if we can just write a single statement to register the artifact in both LineaPy and MLflow so we no longer need to worry about the syncing problem between LineaPy and MLflow. This is exactly what the rest of the post is about.

Note that, MLflow supports several model flavors such as sklearn, tensorflow, statsmodels, and more. In the rest of the post, we will use `flavor` to represent various MLflow-supported flavors.

What’s the New Behavior and What is Happening Under the Hood?

LineaPy supports using MLflow as the storage backend for ML models. Instead of writing mlflow.flavor.log_model and lineapy.save twice for the same artifact (model), and save the artifact in both locations, we can now write a single lineapy.save and the model will only be saved in MLflow but registered in both the MLflow and LineaPy stores.

Saving ML Models (as Artifacts)

When we run lineapy.save(model, 'model_name'), LineaPy detects the object type of model. If the model is an MLflow-supported flavor model, instead of using LineaPy as the storage backend to serialize/deserialize the model (artifact), LineaPy calls mlflow.flavor.log_model to log the model into the MLflow artifact store.

Retrieving ML Models(as Artifacts)

Once we save an artifact from LineaPy using MLflow as the backend storage, we can retrieve the model from both LineaPy and MLflow depending on our preference.

LineaPy Way

We can use the same API for all other LineaPy artifacts to retrieve the artifact value (ML model) even if it used MLflow as the storage backend as in the following:

artifact = lineapy.get('model_name', version=artifact_versiona)
model = artifact.get_value()

MLflow Way

Since ML models are using MLflow as the backend storage, they are registered in the MLflow model store as well. Thus, we can retrieve the same model with MLflow API as well.

model = mlflow.flavor.load_model('model_uri')

Other LineaPy Functionality

All the LineaPy features like code slicing, lineapy.get('model_name').get_code() and pipeline generation lineapy.to_pipeline(['model_name']) should work as usual.

MLflow Functionality

All the MLflow features should work as usual. There is no need to change the way we interact with MLflow because of LineaPy.

How to Configure MLflow within LineaPy?

Here are some configuration items related to MLflow needed to be set so LineaPy can use MLflow as the storage backend for ML models.

mlflow_tracking_uri: where the MLflow model is tracked (see MLflow Tracking).
mlflow_registry_rui: (optional) depend on how our MLflow is configured (see MLflow Model Registry).
default_ml_models_storage_backend: which storage backend(lineapy or mlflow) to use by default for an MLflow-supported model (pre-configured as mlflow).

These configuration items can be set like all other LineaPy configuration items.

Example

Here is a basic example to configure MLflow within LineaPy, use lineapy.save to register the ML model in both LineaPy and MLflow stores, and retrieve the model with both LineaPy API and MLflow API.

import lineapy
import mlflow

# Configure MLflow within LineaPy
lineapy.options.set('mlflow_tracking_uri','file:///tmp/mlruns')
lineapy.options.set('mlflow_registry_uri','sqlite://')

# Train a sklearn model
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=0)
X = [[ 1,  2,  3], [11, 12, 13]]  # 2 samples, 3 features
y = [0, 1]
clf.fit(X, y)

# Only need one save statement and model should be register in both LineaPy and MLflow
lineapy.save(clf, 'clf', registered_model_name='lineapy_clf') 

# Retrieve Model from LineaPy 
art = lineapy.get('clf')
lineapy_model = art.get_value()

metadata = art.get_metadata()

# Retrieve Model from MLflow
client = mlflow.MlflowClient()
latest_version = client.search_model_versions("name='lineapy_clf'")[0].version
mlflow_model = mlflow.sklearn.load_model(f'models:/lineapy_clf/{latest_version}')

What Model Flavors are Currently Supported

Currently, we’re supporting the following model flavors: prophet, sklearn, statsmodels, xgboost, and more. We plan to support all MLflow-supported flavors soon.

Final Thought

As we can see from the above example, using LineaPy with MLflow is extremely easy. With minimal code change (using lineapy.save instead of mlflow.flavor.log_model), we can enjoy the benefits of both libraries.

However, we believe what we have achieved so far is just the tip of the iceberg regarding the integration of LineaPy with other tools. One potential direction related to this post is whether we should let LineaPy just detect mlflow.flavor.log_model and register the model as a LineaPy artifact automatically. We would love to hear your thoughts on this.