🔍

Qualcomm Day on TinyML with Marios Fourlaris

Jul 10, 2024

Dynamite Talk Series: Qualcomm Day on TinyML

Introduction

  • Host: Gary Gossip from Home AI Research, Bay Area, California
  • Guest Speaker: Marios Fourlaris from Qualcomm AI Research Center, Amsterdam

Announcements

  • Sponsors: ARM, DeepLight, H-Impulse, GreenWave Technologies, Latin AI, HRTG Mob, Maxim Integrated (part of Analog Devices), Kixo, Qualcomm Reality, Sentisys, Synthetic
  • TinyML Asia Event: November 2nd - 5th, starting at 8 am China Standard Time
  • TinyML Vision Challenge: Jointly with Hackster.io
    • 500 participants, 52 submissions
    • Winners announced next week
  • Next Talk: October 5th, Speaker: Professor Alessio Lomuscio from Imperial College London

Speaker Introduction

  • Marios Fourlaris: Deep learning researcher at Qualcomm AI Research
    • Focus: Power-efficient training and inference of neural networks
    • Background: MSc Engineering from University of Cambridge, Machine Learning at University College London

Talk Overview: Practical Guide to Neural Network Quantization

  • Overview: Fundamentals and practical applications of neural network quantization
  • Topics Covered:
    • Energy-efficient machine learning
    • Quantization fundamentals
    • Post-training quantization (PTQ)
    • Quantization-aware training (QAT)
    • AIMET toolkit for quantization and compression

Energy-efficient Machine Learning

  • Trends:
    • Increasing energy consumption for marginal accuracy improvements
    • AI moving from the cloud to edge devices (privacy, speed, reduced communication overheads)
  • Challenges:
    • Energy and thermal constraints on edge devices
    • Need for power-efficient neural networks

Approaches to Reducing Power Consumption

  • Compression: Pruning the model
  • Quantization: Reducing the precision of the model
  • Compilation: Efficiently compiling AI models
  • Focus: Quantization

Quantization Fundamentals

Basics

  • Start with trained neural network
  • Lower precision storage
  • Simulation of fixed-point operations using floating-point numbers
  • Types of Quantization: Symmetric and Asymmetric

Algorithms

  • Post-Training Quantization (PTQ)

    • Cross-layer equalization
    • Bias correction
    • Adaround for optimal rounding
  • Quantization-Aware Training (QAT)

    • Straight-through estimator
    • Learnable quantization parameters

Special Layers Simulation

  • Methods for dealing with layers like max pool, average pool, element-wise addition, and concatenation
  • Emphasis on accurate reflection of on-device performance

Setting Quantization Parameters

  • Methods: Mean-max range, optimization-based, batchnorm-based
  • Recommendation: Mean squared error method for setting ranges

AIMET Toolkit

  • Open-source model efficiency toolkit
  • Supports both PTQ and QAT
  • Includes: Pre-trained quantized models, simulation tools, and quantization techniques

Conclusion

  • Importance of efficient neural network quantization for edge AI
  • Encouragement to explore the AIMET toolkit and the white paper on neural network quantization