Approximation in Machine Learning and Probabilistic Inference

Jun 22, 2024

Lecture on Approximation in Machine Learning and Probabilistic Inference

Introduction to Abstraction in Computer Science

  • Abstraction is a fundamental principle in computer science.
  • Breakdown in abstraction in modern machine learning can impact scientific inferences.
  • Optimization and Abstracted Models:
    • Statistical and machine learning models rely on underlying infrastructure (e.g., cloud architecture, Cuda).
    • Optimization and probabilistic inference (e.g., Bayes' rule) are assumed to be abstract and unbiased.

Issues in Modern Machine Learning

  • Stochastic Optimization:
    • Introduces inductive bias in neural networks as a regularizer.
  • Reproducibility Crisis:
    • Difficulty in reproducing results from machine learning papers.
    • GitHub repositories and Docker containers may not always help.

Probabilistic Inference

  • Approximate inference is necessary with large models.
  • Bayesian Reasoning:
    • Transforms prior belief and data into posterior belief using Bayes' rule.
  • Impact on scientific statements and models, especially in computational neuroscience.

Conditional Diffusion

  • Using guidance to do conditional sampling is akin to applying Bayes' rule.
  • Sequential Monte Carlo discussed as a method for propagating densities over time points.

Importance of Accurate Approximate Inference

  • Gaussian Processes (GPs) as an example model:
    • Regression problem with GP model: F ~ GP(mu, K).
    • Computational Challenges:
      • Calculation of kernel matrix is cubic in complexity (O(n^3)).
      • Approximation methods are required for solving these efficiently.

Linear Solvers and Approximate Methods

  • Conjugate Gradients:
    • An optimization technique for solving linear systems by taking steps in a different norm (Khat norm).
  • Probabilistic Inference with Approximate Methods:
    • Different methods introduce biases and uncertainties that affect scientific inferences.

Effective Data and Computational Uncertainty

  • Effective Data Set:
    • Instead of exact inference, consider the effective data as induced by computation.
    • Mathematical uncertainty combined with computational uncertainty provides true updated beliefs.

Implications for Scientific Inquiry

  • Incorrectly accounting for computational approximations can result in weak scientific statements.
  • Need to integrate computational uncertainty into models to ensure robust inferences.

Methodology for Propagating Uncertainty

  • Gaussian Processes Examination:
    • Steps to achieve combined uncertainty: Validate computational steps, effective data, and uncertainty propagation.

Practical Applications and Observations

  • Conjugate Gradients and Cholesky Factorization:
    • Different methods show varying degrees of updating and reducing computational uncertainty.
  • Ultimate goal: More accurate and theoretically robust uncertainty assessment.

Probabilistic Numerics and Further Applications

  • Probabilistic Numerical Methods:
    • Propagating uncertainty in probabilistic numerics aligns with Gaussian Process models.
  • Extending to model selection, including decisions about computational resources and data collection.

Conclusion

  • Approximate inference needs to be handled probabilistically to provide accurate scientific statements.
  • Data's influence and computational capabilities need to be accurately represented in the models.