• Classes
  • 13 - Tracking
  • Part 1

Introduction

As computer engineers, we understand the significance of code versioning and Git flow in software development. It provides a systematic approach to managing code changes, enabling collaboration, version control, and seamless integration of new features.

Important!

Git flow ensures a streamlined development process, allowing teams to work efficiently while maintaining code quality.

Question 1

Is using git important for MLOps? Explain.

Question 2

Is using git enough to version ML products?

Traditional Software Development

To be able to answer the last question, it is important to understand the differences between traditional software development and Machine Learning product development.

In general, software development focuses on meeting pre-established functional requirements. The final quality of the product is intrinsically related to the quality of the code.

On the other hand, in ML the objective usually involves a cycle of optimization of some metric (accuracy, MSE), quality also depends on the data in addition to the code. Besides that, the development environment is more diverse than in traditional software development.

ML Lifecycle

Question 3

Imagine you've developed an ML product, and the deployment has already been completed. Can you identify any scenarios where it might be necessary to update the project? What types of changes could be made?

Answer!

Some possibilities, among others:

  • Retraining the same model on new data.
  • Adding new features to the model.
  • Developing new algorithms.

Important!

During these modifications, it will be desirable to compare metrics to validate whether the modifications actually improve the model performance.

Imagine you're updating a model and making various changes.

After testing two or three algorithms, adding new features, and adjusting parameters, you might find yourself confused, unsure which results correspond to which version of the model.

A representation of this data scientist, according to DALL·E 3 😄:

We can see that ML projects involve not only code but also data, models, and experiments. The ML Lifecycle will include a lot of trial and error, hypothesis testing and monitoring to ensure quality of the delivered product.

Managing the versioning and reproducibility of ML models and datasets is crucial for ensuring the reliability and consistency of results.

MLflow

MLflow is an API that allows you to integrate MLOps principles into your projects with minimal changes made to existing code, providing a comprehensive framework for managing and organizing ML workflows.

With MLflow tracking, developers can easily log and monitor parameters, metrics, and artifacts generated during ML runs and analyze relevant details of ML projects.

Some key concepts of MLflow are:

  • Experiment: An experiment represents a specific machine learning task or project. It acts as a container for runs and helps organize and group related runs together.
  • Run: A run represents a specific execution of an MLflow script or code. It captures the parameters, metrics, and artifacts generated during the run.
  • Parameters: Parameters are inputs or configurations that define an MLflow run. They can be hyperparameters, model configurations, or any other variables that affect the experiment's outcome.
  • Metrics: Metrics are measurements or evaluation criteria used to assess the performance of a model during training or evaluation. MLflow allows logging various metrics such as accuracy, loss, F1-score, or any other custom metric.
  • Artifacts: Artifacts are the output files generated during an MLflow run, such as trained models, visualizations, or data files.
  • Tags: Tags are user-defined key-value pairs that provide additional metadata for experiments and runs. They can be used to add descriptive labels, track specific attributes, or categorize experiments based on certain criteria.

Advance to the next topic, where we will see in practice how to use MLflow to track the development of an ML product.