YOLO N Architecture Lecture

hello and welcome to this video in this video I will explain about YOLO n architecture before explaining the whole YOLO N9 architecture I will explain details of YOLO n blocks so that you can understand more easily the most commonly used block is the convolutional block this block is based on con class in the common. P file the file is in the model's folder a convolutional block consists of a 2d convolutional layer a 2d beds normalization and activation function in yo9 default activation function is silo they are all fused together into a single convolutional block in yo9 there is an autoed code or autoed this code is used to determine the padding value if it is not defined or none when using a convolution block to determine the padding value use the following formula the kernel value is divided by two using the four deficient Operator by using this operator the results will be rounded to the nearest integer for example there is a convolution block with kernel size three stride three and does not define a padding value then the padding value is one another example there is a convolution block with kernel size one stri one and does not define a padding value then the padding value is zero next the rep confient block this block is based on rep confin class in the common. P file in this block there is two convolutional blocks element wise addition and CEO activation function in this block the input will go through two convolutional blocks in the first convolutional block the kernel size is three thread one and padding one meanwhile in the second convolutional block the kernel size is one stri one and padding zero the results of the two convolutional blocks will be added then The Silo activation function will be applied the plus operator in Pythor is used for element wise addition between two tensors that have the same size when two tensors are added each element in the first tensor will be added to the corresponding element of the second tensor in this example both tensors have the same form of 2x3 so we can perform element wise addition between them each element of the first tensor is added to the corresponding element of the second tensor producing a new tensor with the sum result next is the rep and botton neck block this block is based on rep and Boton neck class in the common. pi file this block is a sequence of blocks with shortcut in the sequence of blocks there are rep con and convolutional block rep and Bott neck blocks are similar to The Bott neck Block in rnet the difference is that in the rnet Bott neck there are three convolutional block and use the reload activation function next one is the rep and CSP block this block is based on rep and CSP class in the common. pi file in this block there are three convolutional blocks and sequence of rep and bottom neck blocks in repap and CSP blocks we can have many rep and Bott neck blocks according to the end parameter value in this block the input will go through two convolutional blocks with kernel size one and stri one the result of the two convolution blocks one goes to the rep and Bot neck block while the other goes directly into the concat block at the end there is another convolutional block next is the rep and CSP L for Block this block is based on rep and CSP L for class in the common. P file this block is jelen or generalized efficient layer aggregation Network Jalen combines two neural network architectures CSP net and Elan Jen generalizes elen capabilities elen on the use stacking of convolutional blocks meanwhile Jen can use various Computing blocks this means that JN is not limited to the use of convolutional layers alone but can also leverage various other types of computational blocks such as residual blocks or more complex blocks this block contains a convolutional block which then the resulting feature Maps will be split one goes to the rep and CSP and convolutional block whereas the other goes directly into the concat block and then there is another convolutional block weapon CSP L for Block is a gelon that combines CSP and new L next a down block this block is used to perform down sampling operations or reduce resolution on the feature map this block is based on Adon class in the common. P file in this block the input will be applied by two Dimension aage poly the results will then be divided into two parts one part will be directed to the convolutional block while the other will go to the max pooling layer followed by another convolutional block the output results from the two paths will be combined again using the concet operation the SP l block is next it is a modification of SP or spal pyramid ping the main function of the SP land is to generate a fixed feature representation of objects of various sizes in an image without resizing the image or introducing special information loss this block is based on SP L class in the common. pile inside SP L there are convolutional block at the beginning and followed by 3 SP block this SP block contains the max pole layer every resulting Fe map is concatenated right before the end of SP L SP L is ended with a convolutional block for your information SP L is similar to SPF on yo8 the difference is the kernel and pedding size in the convolution block next the YOLO fine architecture in general the YOLO architecture is divided into three parts there are the backbone neck and head backbone is the Deep learning architecture that basically acts as a feature extractor the neck combines the features acquired from the various layers of the backbone model the head predicts the classes and bonding box regions which is the final output produced by the object detection model in y P9 propos the concept of programmable gradient information of PGI PGI can provide complete input information for the Target task to calculate objective function so that reliable gradient information can be obtained to update Network weights therefore in the architecture a new part was added specifically the auxiliary the auxiliary improves the training processes reliability by providing additional information that links the input data to the Target output so the the problem of losing information when passing through deep Learning Network layers can be resolved next I will explain the whole Yol of un9 architecture this architecture drawing is based on YOLO 9C architecture file YOLO 9C yaml which is located in models detect folder log numbering in the architecture based on the architecture file numbering starts from the backbone section and starts from zero for example this silence block is the first block in the architecture so we assign it the number zero and we draw the block as shown on the screen next this convolutional block is the second block so we assign it to number one then there is the P value p is the notation adopted from efficient dep which represents future level for the next convolution block we give number two and so on the YOLO v9 input is an image with three channels next the backbone this backbone begins with silence block this block does not perform any transformation operations on its input it just returns its input unchanged this block connects the YOLO F9 input to the auxiliary and backbone next there are two convolutional blocks with kernel size three and stride size two the special resolution of the output is reduced when strip two is used for example if the input resolution in the first convolutional block is 640 by 640 the output resolution after processing will be 320x 320 the channel output value corresponds to the value in the architecture file for example like this in this convolution block the value is 64 therefore the output Channel size is 64 next is the rep and CSP L for Block within parameter equal one and determines how many rep and Bott neck blocks are used the channel output value from this block corresponds to the value in the architecture file for example like this in this rep and CSP land for Block the value is 200 56 therefore the output Tel size is 256 and then this value is the nend parameter next there is a down block this block is used to perform down sampling operations or reduce resolution on the Fe man the rep and CSP L for Block comes next with the nend parameter equal one this Block's output is also connected to the neck and the auxiliary next there is another Adon block and then another rep and CSP L for block with the nend parameter equal one this Block's output is also connected to the neck and the auxiliary next there is another a down block after that there is rep and CSP land for block with the N parameter equal one this block connected to the auxiliary and the SP land block SP land spal pyramid pulling inan is used after the last convolution layer on the backboard following that an explanation of the neck first there is the upsample layer this layer is used to increase the feature map resolution of the SP lamp to match with the feature map resolution of this rep and CSP L for Block the up sample feature map will be combined with the features from this rep and CSP L for Block using concat when using concat the number of TL is summed up whereas the resolution is unchanged for example we will compute the concatenation of this rep and CSP L for Block feature map and this up sample f m the output of this rep and CSP land for Block is 40x 40x 512 and the up sample output is 40x 40x 512 the result of concatenation is 40x 40x 1,24 the following is rep and CSP L for BL block with the end parameter equal one the resolution of the rep and CSP L for Block feature map will be upsampled to match the resolution of the feature map of this rep and CSP land for Block using concat the up sample feature map will be combined with the features from this rep and CSP L for Block next there is another rep and CSP L for Block the feature map of this block will be used as an input for the detect block this detect block is specialized for detecting small objects the output of this block is also used as an input to this a down block the resolution of the feat map will be reduced by half using this block furthermore concat will be used to combine the feat map from this add down block with the feature map from this rep and CSP L for Block next there is another rep and CSP L for Block the feat map of this block will be used as input for the detector block this detect block is specialized for detecting medium siiz objects the output of this block is also used as input to this add down block next concat will be used to combine the feature map from this add down block with the feature map from s p l block finally there is another repen CP L for Block this blocks feat map will be utilized as an input for the detect block this detect block is specialized for detecting large objects next is an explanation of the auxiliary section as previously explain the auxiliary provide additional information that links the input data to the Target output therefore this section takes input from the silence block and adds multiple blocks that are identical to those found in the backbone there are two convolution rep and CSP L and add down blocks next there are three CB linear blocks these blocks is used to create different pyramid like feature maps and is used to obtain higher level features of the first backbone the output of this block varies according to the values written to the architecture file when using this block for example like this in CB linear block number 23 in this list only one value is written specifically 256 this value determines the output Channel because there is only one and the value is 256 there there is only one output with 256 channels in CB linear block number 24 there are two values specifically 256 and 512 as a result there are two outputs one with 256 channels and the other with 512 channels in CB linear block number 25 there are three values so the output is three next there is CB fuse block this block is the main part which contains reversible property which composite higher level features of the first backbone into lower level of features of the second backbone there are four inputs on this CB fuse block specifically from addon and the 3cb linear blocks this blocks output size will be the same as that of this a down block the output of this CB fuse will be connected to the rep and CSP land for Block this blocks fitter will be utilized as an input for the additional detect block that specializes in detecting small objects the output of this block is also used as input to this Adon block next there is another CB fuse block the input to this block is the output from a down and two CB linear blocks the output of this CB fuse will be connected to the rep and CSP land for Block this blocks feature map will be utilized as an input for the additional detect block that's specializes in detecting medium objects the output of this block is also used as input to this a down block next there is another CB fuse block the input to this block is the output from a down and CB linear blocks the output of this CB fuse will be connected to the rep and CSP land for Block this Block's F map will be utilized as an input for the additional detect block that specializes in detecting big objects this auxiliary section is only used during the training process during the inference process this section can be deleted to increase the speed of the model without reducing its accuracy congratulations you have finished learning a complete breakdown of YOLO 9 architecture if you are interested to learn more about YOLO 9 you can follow our threein one course containing YOLO 9 YOLO 8 and YOLO 7 the link is in the description if you feel that this video is helpful please like and share the video thank you and until next time

hello and welcome to this video in this video I will explain about YOLO n architecture before explaining the whole YOLO N9 architecture I will explain details of YOLO n blocks so that you can understand more easily the most commonly used block is the convolutional block this block is based on con class in the common. P file the file is in the model&#39;s folder a convolutional block consists of a 2d convolutional layer a 2d beds normalization and activation function in yo9 default activation function is silo they are all fused together into a single convolutional block in yo9 there is an autoed code or autoed this code is used to determine the padding value if it is not defined or none when using a convolution block to determine the padding value use the following formula the kernel value is divided by two using the four deficient Operator by using this operator the results will be rounded to the nearest integer for example there is a convolution block with kernel size three stride three and does not define a padding value then the padding value is one another example there is a convolution block with kernel size one stri one and does not define a padding value then the padding value is zero next the rep confient block this block is based on rep confin class in the common. P file in this block there is two convolutional blocks element wise addition and CEO activation function in this block the input will go through two convolutional blocks in the first convolutional block the kernel size is three thread one and padding one meanwhile in the second convolutional block the kernel size is one stri one and padding zero the results of the two convolutional blocks will be added then The Silo activation function will be applied the plus operator in Pythor is used for element wise addition between two tensors that have the same size when two tensors are added each element in the first tensor will be added to the corresponding element of the second tensor in this example both tensors have the same form of 2x3 so we can perform element wise addition between them each element of the first tensor is added to the corresponding element of the second tensor producing a new tensor with the sum result next is the rep and botton neck block this block is based on rep and Boton neck class in the common. pi file this block is a sequence of blocks with shortcut in the sequence of blocks there are rep con and convolutional block rep and Bott neck blocks are similar to The Bott neck Block in rnet the difference is that in the rnet Bott neck there are three convolutional block and use the reload activation function next one is the rep and CSP block this block is based on rep and CSP class in the common. pi file in this block there are three convolutional blocks and sequence of rep and bottom neck blocks in repap and CSP blocks we can have many rep and Bott neck blocks according to the end parameter value in this block the input will go through two convolutional blocks with kernel size one and stri one the result of the two convolution blocks one goes to the rep and Bot neck block while the other goes directly into the concat block at the end there is another convolutional block next is the rep and CSP L for Block this block is based on rep and CSP L for class in the common. P file this block is jelen or generalized efficient layer aggregation Network Jalen combines two neural network architectures CSP net and Elan Jen generalizes elen capabilities elen on the use stacking of convolutional blocks meanwhile Jen can use various Computing blocks this means that JN is not limited to the use of convolutional layers alone but can also leverage various other types of computational blocks such as residual blocks or more complex blocks this block contains a convolutional block which then the resulting feature Maps will be split one goes to the rep and CSP and convolutional block whereas the other goes directly into the concat block and then there is another convolutional block weapon CSP L for Block is a gelon that combines CSP and new L next a down block this block is used to perform down sampling operations or reduce resolution on the feature map this block is based on Adon class in the common. P file in this block the input will be applied by two Dimension aage poly the results will then be divided into two parts one part will be directed to the convolutional block while the other will go to the max pooling layer followed by another convolutional block the output results from the two paths will be combined again using the concet operation the SP l block is next it is a modification of SP or spal pyramid ping the main function of the SP land is to generate a fixed feature representation of objects of various sizes in an image without resizing the image or introducing special information loss this block is based on SP L class in the common. pile inside SP L there are convolutional block at the beginning and followed by 3 SP block this SP block contains the max pole layer every resulting Fe map is concatenated right before the end of SP L SP L is ended with a convolutional block for your information SP L is similar to SPF on yo8 the difference is the kernel and pedding size in the convolution block next the YOLO fine architecture in general the YOLO architecture is divided into three parts there are the backbone neck and head backbone is the Deep learning architecture that basically acts as a feature extractor the neck combines the features acquired from the various layers of the backbone model the head predicts the classes and bonding box regions which is the final output produced by the object detection model in y P9 propos the concept of programmable gradient information of PGI PGI can provide complete input information for the Target task to calculate objective function so that reliable gradient information can be obtained to update Network weights therefore in the architecture a new part was added specifically the auxiliary the auxiliary improves the training processes reliability by providing additional information that links the input data to the Target output so the the problem of losing information when passing through deep Learning Network layers can be resolved next I will explain the whole Yol of un9 architecture this architecture drawing is based on YOLO 9C architecture file YOLO 9C yaml which is located in models detect folder log numbering in the architecture based on the architecture file numbering starts from the backbone section and starts from zero for example this silence block is the first block in the architecture so we assign it the number zero and we draw the block as shown on the screen next this convolutional block is the second block so we assign it to number one then there is the P value p is the notation adopted from efficient dep which represents future level for the next convolution block we give number two and so on the YOLO v9 input is an image with three channels next the backbone this backbone begins with silence block this block does not perform any transformation operations on its input it just returns its input unchanged this block connects the YOLO F9 input to the auxiliary and backbone next there are two convolutional blocks with kernel size three and stride size two the special resolution of the output is reduced when strip two is used for example if the input resolution in the first convolutional block is 640 by 640 the output resolution after processing will be 320x 320 the channel output value corresponds to the value in the architecture file for example like this in this convolution block the value is 64 therefore the output Channel size is 64 next is the rep and CSP L for Block within parameter equal one and determines how many rep and Bott neck blocks are used the channel output value from this block corresponds to the value in the architecture file for example like this in this rep and CSP land for Block the value is 200 56 therefore the output Tel size is 256 and then this value is the nend parameter next there is a down block this block is used to perform down sampling operations or reduce resolution on the Fe man the rep and CSP L for Block comes next with the nend parameter equal one this Block&#39;s output is also connected to the neck and the auxiliary next there is another Adon block and then another rep and CSP L for block with the nend parameter equal one this Block&#39;s output is also connected to the neck and the auxiliary next there is another a down block after that there is rep and CSP land for block with the N parameter equal one this block connected to the auxiliary and the SP land block SP land spal pyramid pulling inan is used after the last convolution layer on the backboard following that an explanation of the neck first there is the upsample layer this layer is used to increase the feature map resolution of the SP lamp to match with the feature map resolution of this rep and CSP L for Block the up sample feature map will be combined with the features from this rep and CSP L for Block using concat when using concat the number of TL is summed up whereas the resolution is unchanged for example we will compute the concatenation of this rep and CSP L for Block feature map and this up sample f m the output of this rep and CSP land for Block is 40x 40x 512 and the up sample output is 40x 40x 512 the result of concatenation is 40x 40x 1,24 the following is rep and CSP L for BL block with the end parameter equal one the resolution of the rep and CSP L for Block feature map will be upsampled to match the resolution of the feature map of this rep and CSP land for Block using concat the up sample feature map will be combined with the features from this rep and CSP L for Block next there is another rep and CSP L for Block the feature map of this block will be used as an input for the detect block this detect block is specialized for detecting small objects the output of this block is also used as an input to this a down block the resolution of the feat map will be reduced by half using this block furthermore concat will be used to combine the feat map from this add down block with the feature map from this rep and CSP L for Block next there is another rep and CSP L for Block the feat map of this block will be used as input for the detector block this detect block is specialized for detecting medium siiz objects the output of this block is also used as input to this add down block next concat will be used to combine the feature map from this add down block with the feature map from s p l block finally there is another repen CP L for Block this blocks feat map will be utilized as an input for the detect block this detect block is specialized for detecting large objects next is an explanation of the auxiliary section as previously explain the auxiliary provide additional information that links the input data to the Target output therefore this section takes input from the silence block and adds multiple blocks that are identical to those found in the backbone there are two convolution rep and CSP L and add down blocks next there are three CB linear blocks these blocks is used to create different pyramid like feature maps and is used to obtain higher level features of the first backbone the output of this block varies according to the values written to the architecture file when using this block for example like this in CB linear block number 23 in this list only one value is written specifically 256 this value determines the output Channel because there is only one and the value is 256 there there is only one output with 256 channels in CB linear block number 24 there are two values specifically 256 and 512 as a result there are two outputs one with 256 channels and the other with 512 channels in CB linear block number 25 there are three values so the output is three next there is CB fuse block this block is the main part which contains reversible property which composite higher level features of the first backbone into lower level of features of the second backbone there are four inputs on this CB fuse block specifically from addon and the 3cb linear blocks this blocks output size will be the same as that of this a down block the output of this CB fuse will be connected to the rep and CSP land for Block this blocks fitter will be utilized as an input for the additional detect block that specializes in detecting small objects the output of this block is also used as input to this Adon block next there is another CB fuse block the input to this block is the output from a down and two CB linear blocks the output of this CB fuse will be connected to the rep and CSP land for Block this blocks feature map will be utilized as an input for the additional detect block that&#39;s specializes in detecting medium objects the output of this block is also used as input to this a down block next there is another CB fuse block the input to this block is the output from a down and CB linear blocks the output of this CB fuse will be connected to the rep and CSP land for Block this Block&#39;s F map will be utilized as an input for the additional detect block that specializes in detecting big objects this auxiliary section is only used during the training process during the inference process this section can be deleted to increase the speed of the model without reducing its accuracy congratulations you have finished learning a complete breakdown of YOLO 9 architecture if you are interested to learn more about YOLO 9 you can follow our threein one course containing YOLO 9 YOLO 8 and YOLO 7 the link is in the description if you feel that this video is helpful please like and share the video thank you and until next time

Transcript for:YOLO N Architecture Lecture

Transcript for:
YOLO N Architecture Lecture