Nano Browser Overview

looks like we have a brand new open-source Aentic web browser on this scene and it's pretty powerful allow me to introduce to you Nano Browser nano browser is an open-source Aentic web automation app that allows you to run multi- aent workflows directly within your browser what makes it even more exciting is that you can power it with any model whether that's a local model or Gemini 2.5 Pro or even the Cloud 4 series you can plug in any sort of model with your API key or with the local model via O Lama and run it within your browser with Nano Browser it is completely free you bring your own API keys like I mentioned and you also have the ability to integrate something like Olama it's fully open source and some of the key features involve having it so that you can have multi- agent workflows systematically working alongside other agents simultaneously which is insane cuz you can have one task running on one side like booking a flight and on the other side it is working on web browsing capabilities and you also have it so that there's an interactive side panel that will show up on the right hand panel giving you a visualization of what is happening and the agent is essentially going to be able to communicate in that wave with you there's task automation follow-up questions conversation history as well as multiple LM support essentially nano browser acts as a fully autonomous web agent making it a strong alternative to tools like open AI's operator or the browser use agentic framework but being fully open- source fully customizable and completely local just take a look at it in action here nano browser's multi- aent system is actively analyzing hugging face in real time and you can see the planner's agent intelligently self-correcting whenever it encounters any sort of obstacle it is something that will dynamically instruct the navigator agent to adjust its approach and all of this is happening locally right within your browser whether it's automating web research or data extraction or even complex workflows nano browser is bringing aentic web automation to the open-source world before we get started I just want to mention that you should definitely go ahead and subscribe to the world of AI newsletter i'm constantly posting different newsletters on a weekly basis so this is where you can easily get up-to-date knowledge about what is happening in the AI space so definitely go ahead and subscribe as this is completely for free now you can build from the source which is one option but you also have the ability to install the Chrome extension this is where you can have this Chrome extension operate on different browsers like Chrome or even Edge now I've gone along and I have added the extension and we're going to now open this up and what we want to do first is configure our API keys now like I mentioned I'm pretty lazy of setting this up locally so I'm just going to be using an API provider and this is where I'm going to be using the Gemini model and this is because the Gemini 2.5 series is exceptional with multi modalities where it can work with various sorts of capabilities from audio visual as well as speech and in this case it is the best model for this particular use case you can even provide different providers to select the models with different agents like the planner is an agent that will develop and refine strategies to complete the task so in this case you can set this as the pro cuz the pros are exceptional with different function calling tool calling as well as executing tasks the navigator is something that you can select for the Gemini 2.5 flash due to its performance in um different modalities and then the validator is something that checks if the task are completed so it doesn't really matter you can choose amongst these two there's also the ability to integrate a speechto text model and this is where you can configure the Gemini models to uh used for converting speech to text when using the microphone feature so in this case you can select the flash for that as well and after that is done you can also go back into the general and you can configure things like the max steps per task so it doesn't use up a lot of tokens the max uh actions per step failure tolerance and a couple other things we're going to enable vision cuz we want to make sure our LM can view what's happening on the screen and once that is done you can easily get started right away you can even enable firewall but now we can start interacting within the right-hand panel of nano browser so now what you can do is you can start interacting with nano browser within this right-hand panel if you want you can make it even bigger and I can say something like head over to the channel and scrape the top 10 latest videos and I can send in this prompt and you can see that the planner agent is first going to create a plan this is a multi-step plan that is going to be then sent over to the execution agent and right now you can see it is working on navigating to the YouTube channel and once it has navigated it's going to then open up the video section and then once it finds the video section it will then find the top 10 latest videos and it's going to give me a good context of those 10 videos i'll put it over here and there we go it has already finished outputting it the planner executed the initial task of going over to the website and taking in the plan that the navigator had focused on and then it validated the contents of it by finding the top 10 latest videos from Mistrol all the way to AGUI which is the 10th video 10 days ago so it did a pretty fast job in this task and it executed it pretty quickly next up what I'm doing is I'm going to see if this model is capable of bypassing a normal capture demo in this case you can see right away it did pass this uh word or this normal capture demo it was able to output the contents it used the Gemini 2.5 Pro's visual capabilities to analyze what the words are W as well as the numbers W9H5K and it outputed it and you can see the capture is passed successfully but let's see if the model is capable of bypassing this demo of a recapture which is probably one of the hardest things for an AI to navigate through in this case I had given it given it more instructions like filling out the contents like the field name the last name email and selecting the favorite color i also told it that click on the check mark after it says I'm not a robot and let's see if it's able to select the fire hydrant now this is something that most models tend to fail at but since the Gemini 2.5 Pro and the Flash are exceptional with this let's see if it's able to validate and pass through so guys what I've noticed is that the model selects three correct images and the first image might not be correct but it does do a good job in selecting three correct images of whatever the image is representing so in this case a bus but you're supposed to collect uh click on it four times for it to be validated and looks like it has now validated and it actually passed the capture which is pretty surprising but it did take a couple of tries to do this now if I used an open AI model it wouldn't be capable of doing this cuz it doesn't have the same performance with different modalities like the Gemini models do and guys this is why prompt engineering is so key cuz regardless of the model's performance or capabilities it's actually less about raw model power it's more about how you instruct the agent like we saw with that example to complete any sort of task using the natural language query this is because the genensic systems like nano browser rely heavily on instruction decomposition the way you phrase your task allows the planner agent to break it down into smaller tasks as well as actionable steps and with this there's a well ststructured query that will help the agent understand your intent plan ahead and handle obstacles which is why there's so many different agents within nano browser it's able to adapt to your strategy dynamically as it interacts with different websites or tools that you're working with except what I want Nano Browser to do is head over to Twitter and make a post about Nano Browser itself this is where I wanted to generate a nice tweet about Nano Browser and link its repo so let's see what it's actually capable of doing in this particular case and there we go it has now sent in the tweet to check out Nano Browser an open-source AI web agent and this is something that I did autonomously without me even doing anything and it did a pretty good job in linking the correct repo as well if you like this video and would love to support the channel you can consider donating to my channel through the super thanks option below or you can consider joining our private Discord where you can access multiple subscriptions to different AI tools for free on a monthly basis plus daily AI news and exclusive content plus a lot more but guys that's basically it for today's video on Nano Browser this is an open- source Agentic web automation tool that I highly recommend that you take a look at with all the links that I use in today's video in the description below but with that thought guys thank you guys so much for watching i hope you enjoyed today's video subscribe to the second channel if you haven't already make sure you join our newsletter join our private Discord follow me on Twitter and lastly make sure you guys subscribe turn on notification bell like this video and please take a look at our previous videos because there is a lot of content that you will truly benefit from but with that thought guys thank you guys so much for watching have an amazing day spread positivity and I'll see you guys fairly shortly he suffers

Transcript for:Nano Browser Overview

Transcript for:
Nano Browser Overview