Hey guys, welcome to another tutorial with OpenCV. Today we are going to look into object. detection so i can just hold up an apple or an orange heck even my cell phone lots of different objects and i'm going to get my computer to tell me out loud using its voice what it saw inside of the frame that's enough introduction let's get right into it So the first thing that we need to do is install our dependencies. So we are going to import a couple of libraries. The first thing that we're going to import is OpenCV Contribute Python.
So we're going to say pip install OpenCV-contrib-python. And some people ask me why I use this lately instead of just OpenCV Python it's because OpenCV Python does contain the main components that you need for the basic modules of OpenCV but with OpenCV contrib Python it's going to contain some extra libraries so that we have a little bit extra to work with and so we're just going to hit enter and that's going to start installing okay and if you ever get a warning like I did where you need to update pip or anything you can go ahead and do that as well i'm just going to copy and paste that that should be just really quick and there we go perfect after we have that installed we are now going to go ahead and install clib so pip install cv lib okay and we're going to be using this for our object detection so there's a library that's already learned what certain objects are so we're just going to install that and depending on your internet connection that should be rather quick very good and then we are going to also allow our computer to say out loud what it saw so if it sees me you'll say I saw a person or I saw an apple and orange so on and so forth so we are going to import just a couple more things we're going to say pip install gtts space play sound whoops play sound like that and then finally we are going to install pi object c which is going to help with that sound be a little bit more efficient so i'm going to say pip 3 install capital p y capital o b j capital c Okay, so that will allow play sound to be a little bit more efficient. So I already have a couple of those installed. So it's going to say already satisfied.
For you, it will probably say successful. If you have any errors, just go back, rewind, make sure that you typed everything correctly. Otherwise, let's move on. So I'm just going to slide down my window here and I'm going to now import.
cv2 import cvlib as cv and then from cvlib.objectdetection import draw box so it's going to be drawing a box around our objects for us so make sure you have two b's for box b b o x and then we're going to say from g t t s import g capital t t s oops i said g t a let's do g t t s then finally from play sound import import play sound so there's one two three four five lines of imports but that is everything that we are going to be using for this video so if you need a little bit more time you can go ahead and pause the video and continue that days later. So what we want to do now is now access our camera. Now, originally, when I was first building this and testing things out, I was just having it bring in a specific image. I'm just going to look at objects in the image, but I wanted this to be live.
So it has a live feed and we can detect all the objects in a live feed instead. So we're going to access our cameras. So I'm going to say video equals.
cv2.video capture and that takes an index now for most of you it might be index zero but my webcam that I want to be using instead which is a lot more higher quality is that index one so you can just mess with those indexes as you please but we're going to start with that and now we are going to say while true I'm now going to use my video capture and I'm going to unpack each frame into a variable called frame so what we're going to do is ret comma frame equals our video dot read so unpack that so now we're going through each frame and now we're going to use that bb box where it's going to be seeing the objects it's going to draw a box around it and we're also going to give it a label next to the box to tell us what the object is so we're going to say bb box comma label and then conf okay and conf is really just identifying what the object is it's just going to be returning some decimal numbers really so i'm going to say cv dot detect common objects and now we have to tell it where to get those objects from so we need to say get it from the frame so that's going to be each frame from my video feed and finally we are going to draw that box so we're going to say output image equals draw box and now we need to give draw box the frame i want it to get the box that it's going to be drawing around and we're also going to put the label in there and we'll stick conf in there very good so now that we have that let's go ahead and show the user what the image looks like so i'm going to say cv2 dot imshow And we want to show them the name of the window. So I'll just call this object detection. You can call that whatever you want, comma.
And now we've got to tell it output image. Okay, before we hit run, we're going to give this a weight key. So I'm going to say if cv2.weightkey delay of one. And we're going to check to see if the user is... Clicking a certain button You can say whatever button you want, but I'm gonna say if the user clicks on Q some people like the space bar you can just do a space Do whatever you want, but I'm gonna say if the user says hits Q I want you to break out of this loop and after I hit Q it breaks out of that window.
So very good So as you could see, that was already detecting me as a person, even detected this as a tie. That's already working. So what I want this to do now is I want my program to take each of those labels that it finds in my screen and I want it to append or add to a list so that I have that list of data.
So what we're going to do now is we're going to make a list called labels. So let's come up here and we'll call this labels. make sure this is outside of your loop so it doesn't accidentally rename itself inside the loop and just erase all the data and what we're going to say is we'll do a for loop we're going to say for item in label if item in labels then we're just going to have it pass so that means if if it already found a tie this is going to be checking multiple images it's going to be checking for objects in each frame and so it's going to say maybe like a thousand ties i don't want it to do that i'm just going to say if you find a tie in there then go ahead and put it in the list but if ties are already in the list don't add it to the list so it's only going to say tie one time and you can alter this if you'd like but this is the way i'm going to do it if it's not already in the list then i want you to labels.append.
I want you to append that item. So that item will be added to this list called labels. And just to test that out, I'm going to come down here and print labels. And let's see if that works. Okay, and as you can see here is my list called labels that I just printed so I found a person and it It found a tie.
So very good. I know that this is working because if I didn't, it would be saying person a thousand times and the tie a thousand times, but it's only going to do it once because of this code here. So very good.
So what I want this to do now is I wanted to take this data called labels. So what I'm going to do now is write code using string interpolation to tell me what it found. more logically. For example, I wanted to say something like, I found an orange, a person, a book, a tie, a cell phone, an apple, so on and so forth. And so how I'm going to do that is I'm going to create a for loop for label in labels.
I want this to check to see if this is the first time it's reading out loud a label. So I'm going to create some kind of iterator. So I'm going to say i equals zero.
And I'm going to say here, if i is zero, then I want this to actually append to a list. So I want this to sound a little bit more natural when it says it out loud. So I'm gonna say new sentence equals an empty list.
So I'm going to say, if i is zero, then new sentence dot append, that means to add and i wanted to add i'll use some string interpolation here i found a and i'll put label there so if it's found a person it'll say i found a person and i'll do a comma and comma i'll give this speech out loud to give a little bit of a pause so we'll see how that goes and then i'm gonna say if it is not equal to zero then new sentence dot append and i'll use string interpolation again and i'm gonna say a label like so and then once that is done we are going to increment i so i'm going to say i plus equals one so the first thing it finds i is it going to be zero so the first thing it finds is going to say i found a hat and a person, a book, an apple, so on and so forth. So very good with that. And just to make sure that this is all going to be all in one string for our speech to work properly after this, let's go ahead and use the join function. So I'm just going to say print space dot join new sentence.
So that will turn this list into a string. So let's go ahead and test that out. Perfect. So after hitting run and after it found those things, check it out.
It added each of those things to the list. So it said I found a person and a tie, an orange, a chair, a donut, an apple. Again this might look kind of weird with the commas but it's going to help with the pauses when the computer says it out loud.
So very good. If we have that working just fine then let's go ahead and add our speech part of this. So towards the top of our project I'm going to now add our speech. So let's go ahead and define a function. So def, we'll call this speech, and it's going to receive some text.
And if you've seen my virtual assistant where you build your own Siri or Alexa, this is the same exact function that we're going to be using. So we're going to say print that text because we want to be able to see it as well. And I also want to set my language to whatever I want.
i'm going to set mine to uh we'll just call it yeah we'll say english so e n if you want to do spanish it's es or japanese it's ja you can always look those up on google if you'd like to but we'll do english for now and then we'll give this some output so output equals now we're going to use gtts so gtts which takes some text which is going to be equal to the text that we're going to send it Comma now it's looking for the property of language and so that will be our language and then finally it's going to ask how we want to go so I'm gonna say slow equals false just like that now what we need to do is save the output into a file so what we need to do is we need to save that audio somewhere in our project so come into your project and let's create a new directory and we're just gonna call it sounds so here's sounds i forgot to change the name of the project so don't worry about that but sounds is just underneath that directory so with that in place we can now save so we're going to say output dot save now we're going to tell it where to save so i'm going to say dot slash dot slash sounds and now call your file whatever you want i'm just going to say output dot mp3 so this will be a mp3 file and then finally we're going to have it play the sound so we're going to use the play sound library that we imported up here and we're going to say play sound which is going to be that same exact location output.mp3 and now we got to send whatever text we want over here so down here we have a print instead of print i'm going to say speech because that's the name of our function So that will send our string to our function and it's going to take our text and make it into actual speech. It is then saving and then we are going to play that sound. Let's see if this works. I found a person and a tie, a chair, an apple, a donut, an orange, a cell phone.
So hopefully you could have heard that, but it did say I found a person and a tie, an apple, an orange, a cell phone. So that is working great. Congratulations if you just accomplished that.
That was really cool, pretty simple, and if you have any questions, please let me know down in the comments. If you have any requests, please let me know. Don't forget to drop a like and subscribe so that you are notified of my next tutorial.
Thank you so much and happy coding. These things that I can't let go of So fight and fight and die for the things you need Thank you.