📘

YOLO N Architecture Lecture

Jul 22, 2024

YOLO N Architecture Lecture

Introduction

  • Overview of YOLO N architecture
  • Explanation of key YOLO N blocks before diving into full architecture

Commonly Used Blocks

Convolutional Block

  • Based on conv class in the common.py file (located in models folder)
  • Consists of:
    • 2D convolutional layer
    • 2D batch normalization
    • Activation function (SiLU by default in YOLO 9)
  • Employs auto padding calculation if not defined:
    • Formula: padding = kernel_size // 2
    • Example 1: kernel_size = 3, stride = 3, padding = 1
    • Example 2: kernel_size = 1, stride = 1, padding = 0

RepConv Block

  • Based on RepConv class in common.py
  • Components:
    • Two convolutional blocks
    • Element-wise addition
    • SiLU activation function
  • Process:
    • Input passes through two convolutional blocks
    • First block: kernel_size = 3, padding = 1, stride = 1
    • Second block: kernel_size = 1, padding = 0, stride = 1
    • Element-wise addition of the two blocks' outputs
    • Apply SiLU activation function

RepBottleneck Block

  • Based on RepBottleneck class in common.py
  • Structure:
    • Sequence of blocks with shortcuts
    • Contains RepConv and convolutional blocks
  • Similar to ResNet's bottleneck block (uses SiLU activation function)

RepCSP Block

  • Based on RepCSP class in common.py
  • Structure:
    • Three convolutional blocks
    • Sequence of RepBottleneck blocks
  • Input split into two paths:
    • One to RepBottleneck
    • One to Concat block
  • Ends with a convolutional block

RepCSPLayer Block

  • Based on RepCSPLayer class in common.py
  • Combines two architectures: CSPNet and ELAN
    • CSP: Cross Stage Partial Network
    • ELAN: Efficient Layer Aggregation Network
  • Uses various computing blocks, not just convolutional layers
  • Inputs split into two paths:
    • One through RepCSP and convolutional block
    • One directly to Concat block
  • Ends with a convolutional block

SPPLayer Block

  • Modification of SPP (Spatial Pyramid Pooling)
  • Purpose: Generate feature representation of different object sizes without special info loss
  • Structure:
    • Initial convolutional block
    • Three SPP blocks (each with max pool layer)
    • Concatenation before final convolutional block

YOLO N Architecture

General Structure

  • Divided into three main parts: Backbone, Neck, and Head
  • Introduces Programmable Gradient Information (PGI)
    • Provides complete input info for reliable gradient computation
    • Improves training process reliability
    • Added a new Auxiliary section

Detailed Breakdown

  1. Backbone: Feature Extraction

    • Starts with silence block (no transformation)
    • Multiple convolutional blocks, RepCSPLayer blocks, and Adown blocks
  2. Neck: Feature Combination

    • Upsample layer: Increases feature map resolution
    • Use Concat to combine feature maps from different blocks
    • Several RepCSPLayer and Adown blocks
  3. Head: Prediction

    • Detect blocks specialized for small, medium, and large objects
    • Final detect block takes feature map from various preceding blocks
  4. Auxiliary: Training Enhancement

    • Provides extra info linking input data to target output
    • Multiple blocks similar to Backbone for enhanced feature extraction
    • CBLinear and CBFuse blocks for higher-level features and feature fusion
    • Used only in training, can be deleted during inference

Conclusion

  • Explanation of YOLO 9 architecture complete
  • Further learning available through a three-in-one course covering YOLO 9, 8, and 7
  • Links provided for additional resources

Thank you for watching!