YOLO N Architecture Lecture

Introduction

Based on conv class in the common.py file (located in models folder)
Consists of:
- 2D convolutional layer
- 2D batch normalization
- Activation function (SiLU by default in YOLO 9)
Employs auto padding calculation if not defined:
- Formula: padding = kernel_size // 2
- Example 1: kernel_size = 3, stride = 3, padding = 1
- Example 2: kernel_size = 1, stride = 1, padding = 0

Based on RepConv class in common.py
Components:
- Two convolutional blocks
- Element-wise addition
- SiLU activation function
Process:
- Input passes through two convolutional blocks
- First block: kernel_size = 3, padding = 1, stride = 1
- Second block: kernel_size = 1, padding = 0, stride = 1
- Element-wise addition of the two blocks' outputs
- Apply SiLU activation function

Based on RepBottleneck class in common.py
Structure:
- Sequence of blocks with shortcuts
- Contains RepConv and convolutional blocks
Similar to ResNet's bottleneck block (uses SiLU activation function)

Based on RepCSPLayer class in common.py
Combines two architectures: CSPNet and ELAN
- CSP: Cross Stage Partial Network
- ELAN: Efficient Layer Aggregation Network
Uses various computing blocks, not just convolutional layers
Inputs split into two paths:
- One through RepCSP and convolutional block
- One directly to Concat block
Ends with a convolutional block

Modification of SPP (Spatial Pyramid Pooling)
Purpose: Generate feature representation of different object sizes without special info loss
Structure:
- Initial convolutional block
- Three SPP blocks (each with max pool layer)
- Concatenation before final convolutional block

Divided into three main parts: Backbone, Neck, and Head
Introduces Programmable Gradient Information (PGI)
- Provides complete input info for reliable gradient computation
- Improves training process reliability
- Added a new Auxiliary section

Backbone: Feature Extraction
- Starts with silence block (no transformation)
- Multiple convolutional blocks, RepCSPLayer blocks, and Adown blocks
Neck: Feature Combination
- Upsample layer: Increases feature map resolution
- Use Concat to combine feature maps from different blocks
- Several RepCSPLayer and Adown blocks
Head: Prediction
- Detect blocks specialized for small, medium, and large objects
- Final detect block takes feature map from various preceding blocks
Auxiliary: Training Enhancement
- Provides extra info linking input data to target output
- Multiple blocks similar to Backbone for enhanced feature extraction
- CBLinear and CBFuse blocks for higher-level features and feature fusion
- Used only in training, can be deleted during inference

Explanation of YOLO 9 architecture complete
Further learning available through a three-in-one course covering YOLO 9, 8, and 7
Links provided for additional resources

Thank you for watching!