🔍

Qualcomm Day on TinyML with Marios Fourlaris

Jul 10, 2024

Dynamite Talk Series: Qualcomm Day on TinyML

Introduction

Host: Gary Gossip from Home AI Research, Bay Area, California
Guest Speaker: Marios Fourlaris from Qualcomm AI Research Center, Amsterdam

Announcements

Sponsors: ARM, DeepLight, H-Impulse, GreenWave Technologies, Latin AI, HRTG Mob, Maxim Integrated (part of Analog Devices), Kixo, Qualcomm Reality, Sentisys, Synthetic
TinyML Asia Event: November 2nd - 5th, starting at 8 am China Standard Time
TinyML Vision Challenge: Jointly with Hackster.io
- 500 participants, 52 submissions
- Winners announced next week
Next Talk: October 5th, Speaker: Professor Alessio Lomuscio from Imperial College London

Speaker Introduction

Marios Fourlaris: Deep learning researcher at Qualcomm AI Research
- Focus: Power-efficient training and inference of neural networks
- Background: MSc Engineering from University of Cambridge, Machine Learning at University College London

Talk Overview: Practical Guide to Neural Network Quantization

Overview: Fundamentals and practical applications of neural network quantization
Topics Covered:
- Energy-efficient machine learning
- Quantization fundamentals
- Post-training quantization (PTQ)
- Quantization-aware training (QAT)
- AIMET toolkit for quantization and compression

Energy-efficient Machine Learning

Trends:
- Increasing energy consumption for marginal accuracy improvements
- AI moving from the cloud to edge devices (privacy, speed, reduced communication overheads)
Challenges:
- Energy and thermal constraints on edge devices
- Need for power-efficient neural networks

Approaches to Reducing Power Consumption

Compression: Pruning the model
Quantization: Reducing the precision of the model
Compilation: Efficiently compiling AI models
Focus: Quantization

Quantization Fundamentals

Basics

Start with trained neural network
Lower precision storage
Simulation of fixed-point operations using floating-point numbers
Types of Quantization: Symmetric and Asymmetric

Algorithms

Post-Training Quantization (PTQ)
- Cross-layer equalization
- Bias correction
- Adaround for optimal rounding
Quantization-Aware Training (QAT)
- Straight-through estimator
- Learnable quantization parameters

Special Layers Simulation

Methods for dealing with layers like max pool, average pool, element-wise addition, and concatenation
Emphasis on accurate reflection of on-device performance

Setting Quantization Parameters

Methods: Mean-max range, optimization-based, batchnorm-based
Recommendation: Mean squared error method for setting ranges

AIMET Toolkit

Open-source model efficiency toolkit
Supports both PTQ and QAT
Includes: Pre-trained quantized models, simulation tools, and quantization techniques

Conclusion

Importance of efficient neural network quantization for edge AI
Encouragement to explore the AIMET toolkit and the white paper on neural network quantization

Full transcript