In today's video I thought we can take a look at how I created this almost zero latency real-time transcription that you actually can see on the screen here. We're going to go through some use case ideas I have for this and how you can create this too. So yeah I think we're just going to get started.
You can also see when I stop this now we get a log of everything we said. So yeah perfect. Okay so before we take a look at the code here I want to show you one more example just a use case I thought of. So I have this MrBeast YouTube video here. I'm gonna bring up the terminal.
So what I'm gonna do now is I'm gonna take off my headset I'm gonna put it around my mic here, right? Hopefully that's fine And yeah, let's start the script and let's fire up the video My tanks are literally about to rain missiles upon this $500,000 and any money that doesn't get destroyed. I'm giving to Blake It's been as much money as you want trying to protect this money from all the missiles We're gonna be launching at your money. So I just build whatever I want whatever you want But yeah, so you can see I think that worked out pretty good and you can see we are streaming this and we are actually Real-time transcribing this so yeah, just a cool use case.
I have more planned for this So I'm gonna do like music videos test it out different stuff But yeah, pretty cool right and you can see when I stop this now we kind of get the log here too so yeah that was just basically a very simple use case okay so before we move on to the next use case let's take a look at the python code so this is kind of built on something called fast whisperer so this is kind of a sped up version of whisperer from open ai and yeah we are actually using our gpu here that's why we can get such low latency but i am running on a kind of new 4 series nvidia so it's nothing insane but it's working pretty good you So this is very easy to set up. I'm gonna leave a link to the github here. So just basically just pip install whisper and just follow the instructions here and you can have this set up in no time.
If you go back to our code you can see we have a function that is actually recording from our microphone and creating chunks and here we can adjust the chunk length by um yeah we can set like one I think that's one second so the shorter this is the faster it streams on the in the terminal right. And Faster Whisperer has of course all the models. So we can have small, medium. We can have medium in English. We can have large V3.
That's the best one. And we can have auto detect language. I just set mine to English for now, right? And then we come into this true loop here that is actually taking what we are recording.
And it's printing it. And it's accumulating this into like a log file too. That we can actually, when this breaks, we can print.
the log here other than that it's pretty simple pretty straightforward it's not a big script uh you can see we are using q.course here from my gpu and there's a few things you can adjust here to make it even quicker but i think we're just going to keep it as is for now and i want to move on to kind of our next use case like always if you want to support me become a member of the channel you can follow the link in the description below i will be putting this up on our community github you will also get access to the community discord that is if you want to support me other than that just like the video if you like this kind of content maybe leave a comment or something but yeah let's move on to another use case i created for this okay so the next example is going to be a real-time sentiment analysis so you can see we have a get chat response function here that uses gpt4 and we have such the system message to you are an expert sentiment analysis If you scroll further down here we have basically all the same but here we have something that is more of a sliding window because we always want to keep the prompt to a set number of characters so this is gonna be 100 characters. That means that the prompt will always be 100 characters long and it changes over time. You will see how this works when we run it. And we create a simple UI that is actually is gonna displace the sentiment and yeah.
basically all is the same you can see we have another prompt here so this takes the sliding window as an input and what is the sentiment of the conversation above so answer only with positive neutral or negative and yeah kind of what this does is that it looks at what is going on in the conversation now and gives a sentiment analysis so Let's just fire up the terminal and you will see how this works in action. Okay, so when we start this now, we should get like a pop-off of this UI window. So let's place it over here.
And when we continue to talk now, we can see that the sentiment is changing. So let's talk about some happy stuff. So yeah, I'm very happy. Looking forward to my vacation. I have won a lot of money.
So you can see we are changing to positive now. So if we turn this into, yeah, I'm going to a funeral. I'm feeling pretty depressed.
I've lost a lot of money. I'm broke. Yeah, you can see it's changing to negative. So this is basically like a real time sentiment analysis.
And if we just keep talking, you can see this is probably going to change over time. I think we're gonna go back to neutral now. Yeah, so when we're neutral, it's just gonna keep it like this It gets a bit messy, but I guess yeah, it works So yeah, pretty happy with this and it's a very simple UI and I think it works pretty good But yeah, nothing else to say.
I think this works good. Okay, so the final example is actually gonna be a preview of Wednesday's up upcoming video. So I'm not gonna spoil anything about the code or anything or actually how I'm doing this But I'm just gonna show you how it works Because I'm gonna work a bit more on it to the upcoming video But this is basically up the same alley So if you zoom in a bit here now we fire up the terminal so this is gonna be a bit different But it's kind of takes off the same Workflow, so let's start this now. Okay, so Basically what is going to happen here now is when I start to talk you're going to see images start popping up here.
Let's start just talking about some red cats and maybe some white flowers. We have some green nature. We have some trees.
We have some architecture. We can actually see a blue deer running over the hill. That is going to be very special.
and some UFOs and astronauts are actually walking into the scene. The deer is fighting the astronaut. So what is going to happen next? Well, no one knows. You can see there are some aliens coming and landing on Earth.
People are scared. People are running away and no one knows what to do. People are very scared. People are screaming.
People are running and they are very afraid. Okay, so Sorry about that rant, but yeah, you can kind of see here. So what happened here is actually we are creating images From our sliding window prompt. So this goes on in pretty much real time So the series was kind of this right? So it's a bit strange, but I'm gonna be work a bit more on it and this is also using the faster whisperer but we also have something else that is kind of mixed into this to make this happen i want to create some kind of ui to display the images better because this was not a good solution but i just wanted to leave it as a preview for the upcoming video on wednesday okay so that was basically what i had for today i hope you found it interesting and like i said if you want to get access to this yourself just follow the link in the description you can join my youtube channel you will get access to this private github here where i will be uploading this other than that have a lot of fun with this and i'm gonna make some improvements like i said watch out for wednesday's video i think it's gonna be pretty cool and yeah have a great day and i'll see you again soon