OpenCV Object Detection Tutorial

Introduction

Objective: Implement object detection using OpenCV.
Outcome: Computer verbally announces detected objects from a live video feed.

Dependencies

Libraries to Install

OpenCV Contrib: pip install opencv-contrib-python
- Contains additional libraries beyond basic OpenCV.
cvlib: pip install cvlib
- Used for object detection.
gtts: pip install gtts
- Allows the computer to generate speech.
playsound: pip install playsound
- Plays sound files.
pyobjc: pip3 install pyobjc
- Makes playsound more efficient.

Importing Libraries

import cv2
import cvlib as cv
from cvlib.object_detection import draw_bbox
from gtts import gTTS
from playsound import playsound

Accessing the Camera

Access the video feed from the camera:

video = cv2.VideoCapture(1)  # or use index 0 for default camera

Capturing Frames

Use a loop to read frames from the video feed:

while True:
    ret, frame = video.read()  # Unpack each frame

Detecting Objects & Drawing Bounding Boxes

Detect objects and draw bounding boxes with labels:

bbox, label, conf = cv.detect_common_objects(frame)
output_image = draw_bbox(frame, bbox, label, conf)

Showing Live Object Detection

Display the output image in a window:

cv2.imshow('Object Detection', output_image)

Breaking the Loop

Stop the video feed when 'q' is pressed:

if cv2.waitKey(1) & 0xFF == ord('q'):
    break

Handling Detected Labels

Managing Labels List

Store detected labels in a list without duplicates:

labels = []
for item in label:
    if item not in labels:
        labels.append(item)

Generating Speech from Detected Labels

Creating a New Sentence from Labels

Create a sentence that lists all detected items:

new_sentence = []
for i, label in enumerate(labels):
    if i == 0:
        new_sentence.append(f'I found a {label},')
    else:
        new_sentence.append(label)
sentence = ' '.join(new_sentence)

Defining the Speech Function

Function to convert text to speech and play it:

def speech(text):
    print(text)
    language = 'en'
    output = gTTS(text=text, lang=language, slow=False)
    output.save('sounds/output.mp3')
    playsound('sounds/output.mp3')

Call the speech function with the sentence:

speech(sentence)

Conclusion

Review: In this tutorial, we used OpenCV to detect objects in a live video feed and verbally announce them using gTTS and playsound.
Next steps: Experiment with different indices for your camera, modify the detection criteria, or integrate other functionalities.