Coconote
AI notes
AI voice & video notes
Try for free
🔍
Qualcomm Day on TinyML with Marios Fourlaris
Jul 10, 2024
Dynamite Talk Series: Qualcomm Day on TinyML
Introduction
Host
: Gary Gossip from Home AI Research, Bay Area, California
Guest Speaker
: Marios Fourlaris from Qualcomm AI Research Center, Amsterdam
Announcements
Sponsors
: ARM, DeepLight, H-Impulse, GreenWave Technologies, Latin AI, HRTG Mob, Maxim Integrated (part of Analog Devices), Kixo, Qualcomm Reality, Sentisys, Synthetic
TinyML Asia Event
: November 2nd - 5th, starting at 8 am China Standard Time
TinyML Vision Challenge
: Jointly with Hackster.io
500 participants, 52 submissions
Winners announced next week
Next Talk
: October 5th, Speaker: Professor Alessio Lomuscio from Imperial College London
Speaker Introduction
Marios Fourlaris
: Deep learning researcher at Qualcomm AI Research
Focus: Power-efficient training and inference of neural networks
Background: MSc Engineering from University of Cambridge, Machine Learning at University College London
Talk Overview: Practical Guide to Neural Network Quantization
Overview
: Fundamentals and practical applications of neural network quantization
Topics Covered
:
Energy-efficient machine learning
Quantization fundamentals
Post-training quantization (PTQ)
Quantization-aware training (QAT)
AIMET toolkit for quantization and compression
Energy-efficient Machine Learning
Trends
:
Increasing energy consumption for marginal accuracy improvements
AI moving from the cloud to edge devices (privacy, speed, reduced communication overheads)
Challenges
:
Energy and thermal constraints on edge devices
Need for power-efficient neural networks
Approaches to Reducing Power Consumption
Compression
: Pruning the model
Quantization
: Reducing the precision of the model
Compilation
: Efficiently compiling AI models
Focus
: Quantization
Quantization Fundamentals
Basics
Start with trained neural network
Lower precision storage
Simulation of fixed-point operations using floating-point numbers
Types of Quantization
: Symmetric and Asymmetric
Algorithms
Post-Training Quantization (PTQ)
Cross-layer equalization
Bias correction
Adaround for optimal rounding
Quantization-Aware Training (QAT)
Straight-through estimator
Learnable quantization parameters
Special Layers Simulation
Methods for dealing with layers like max pool, average pool, element-wise addition, and concatenation
Emphasis on accurate reflection of on-device performance
Setting Quantization Parameters
Methods
: Mean-max range, optimization-based, batchnorm-based
Recommendation
: Mean squared error method for setting ranges
AIMET Toolkit
Open-source model efficiency toolkit
Supports both PTQ and QAT
Includes
: Pre-trained quantized models, simulation tools, and quantization techniques
Conclusion
Importance of efficient neural network quantization for edge AI
Encouragement to explore the AIMET toolkit and the white paper on neural network quantization
📄
Full transcript