ML Ops Course Lecture Notes

Jul 30, 2024

ML Ops Course Overview

Course Introduction

  • High-paying job offers possible.
  • Unique focus on adding a "creamy layer" on projects.
  • Experience in international job offers through remote work.
  • Emphasis on ML Ops as a new and valuable skill.

Instructor Background

  • Lead Data Scientist at Triplet.
  • Experience as an ML Ops engineer at Xenaml.
  • Background in NLP products and early experience with AI.

Course Objectives

  • Understanding specific ML Ops terminologies and concepts.
  • Learning about data ingestion, model deployment, and tools (e.g., ML Flow, Xenaml).
  • Conducting an end-to-end project from data to deployment.

ML Ops Importance

  • Exponential data growth necessitates AI and ML solutions.
  • Machine Learning involves more than just model training (only 20% of the project).
  • Focus on the engineering aspect of ML (80%).

The Way ML Teams Function

  • Roles in a typical ML team:
    • Data Scientists: Feature development and model training.
    • Data Engineers: Productionizing data pipelines.
    • ML Engineers: Deploying models.
    • Legal Advisors: Ensuring data usage compliance.

ML in Production Process

  • Data collection ➡️ Model training ➡️ Model deployment
  • Continuous feedback loop: retraining models based on new data and performance decay.

Deployment and Monitoring

  • Deployment makes local models accessible to users (e.g., spam detection systems).
  • Monitoring model performance is critical, as models can decay over time.

Importance of ML Ops

  • ML Ops is a methodology extending DevOps to ML components, ensuring reliability in production.
  • Ideal for scaling operations and managing ML lifecycle efficiently.

Deployment Challenges

  • Latency: Slow models can lead to high abandonment rates.
  • Fairness: Ethical considerations, examples from AI disasters (e.g., Microsoft's Twitter bot).
  • Explainability: Users need to trust AI predictions.

ML Model Characteristics

  • Differentiating between model-centric (improving model parameters) and data-centric (improving data quality) approaches.
  • Advised to focus on data-centric modeling for better results.

ML Ops Project Structure

  1. Business Problem Identification
    • Assess the cost of wrong predictions (e.g., sales forecasting scenarios).
  2. Data Gathering
    • Analyze historical data and market trends.

Machine Learning Workflow Phases

  • Data Engineering: Ingesting, cleaning, and preparing the data.
  • Model Engineering: Training, validating, and deploying models.
  • Pipeline Management: Organizing workflows using pipelines (e.g., Xenaml).

Conclusion

  • Understanding the holistic approach to ML Ops is critical for success in the industry.
  • Next steps include practical implementations with tools like Xenaml and ML Flow.