My name is David Andrzej and here is how to run your own uncensored AI chatbot. So chat GPT's biggest problem is that it's heavily censored. Meaning the model has built-in bias, safety layers and moderation controls. But uncensored models don't have any of these restrictions, allowing you to ask any questions you want.
So in this video I'll show you how to host your own uncensored model online step by step. So you can do this even if you aren't a programmer. Now here's why uncensored actually means good.
Because most people only think like, oh uncensored is illegal activities. That's not the case. There are many commands or prompts that a sensor model is not allowed to follow.
Unfiltered creative writing. Just ask CHPTR Cloud anything about politics. You'll see that it was built by Kamala voters. Security, ethical hacking.
Those are just a few examples. You can think of it as a car. If you wanted to drive as fast as possible, it'd be very hard to do that if one of your wheels was missing. It's the same with AI models. If they have tons of restrictions and built-in biases, guidelines, moderation, it's like having a boot on one of the wheels and you cannot use the model as much as you would want to.
Now you might be thinking, but David, I'm not a programmer and I'm certainly not a prompt engineer. How am I going to build my own uncensored AI chatbot and deploy it online? Well, don't worry. All of the code and prompts from this video will be available in the new society.
Inside, you get exclusive content on how to build AI agents and how to start and monetize your own AI startup. Plus, you get direct access to me via the weekly calls we hold. So, if you're serious about AI, make sure to join. It's the first link in the description. Now, Dolphin72B is the model we're going to be using.
The full name is Dolphin 2.9.2 Qen2-72B. Now, the most important part is this, the number of parameters, which is 72 billion, which is like a medium-sized model. It's not a small model, but it's nowhere near as big as GPT-40 or Cloud 3.5 Sonnet, the new one, right? Right now, this is the most uncensored model currently, meaning it never refuses requests, right? It is the least amount of restrictions possible because it's fine-tuned on LAMA 3.1, but we cannot run it locally.
This is unfortunate. because the hardware would cost like $200,000 because it requires A100 GPUs at the very least. And those are very, very expensive.
So how can we run it then? Well, in order to run this massive model, we have to host it somewhere. And that's where cloud solutions come into play. The simplest answer is to use HuggingFace's inference endpoints, which is a service that allows you to easily host models. And what makes it super easy is that most models are already on HuggingFace.
Like HuggingFace has the most amount of open source models of all possible sites, as well as data sets and other things, right? So this will also allow us to add our own UI to build it into a full web app, an actual chatbot with a domain that is deployed on Netlify that anyone can use. So you can think of potential use cases to build this.
One of the most obvious ones that is going to be such a huge potential is the unrestricted dating app, right? You upload a screenshot and the AI model tells you exactly what to say to get the other person to like you, right? So if you try to do that with Clodage and GPT, it'll be terrible.
But if you build an app with this unrestricted models, it's a seven-figure idea for sure. Real quick, I'm currently hiring a personal assistant. So if you want to join my team, make sure to apply.
The link is going to be below the video. Now, here's why I believe that uncensored models are going to be the future. I believe that most of us, people like me and you, the most serious people, are going to use uncensored models even if they perform worse. So let's say there's an uncensored model that is 5% worse than GBD5, whatever, right?
It's going to be worth to sacrifice that 5% of performance to achieve the full potential of all prompts, right? To not have the model try to reprogram you. try to steer you into a direction, try to, you know, moderate itself and not giving you the real answer, I believe that's going to be worth it for a lot of use cases, right? Maybe not like programming, because then, you know, using the censored models is fine. But again, political tasks, you know, asking what to do in a situation, like human behavior tasks, those, the models are super restricted and using an uncensored model will be super good.
And personally, I just don't like when a model has built-in bias that can influence my decisions. So the answer is to build our own chatbot. So let's do that right now.
All right, so this is the model we're going to be using. Dolphin 2.9 QN2, 72 billion. The link to this Hugging Face repo will be below the video. Now, there's also a second model, which is a smaller one, which is a 7 billion model.
And for a good amount of use cases, this one is actually sufficient. And the reason I'm telling you about this is because it's 10 times smaller, right? 7 billion parameters versus 72 billion parameters, which means it's going to be a lot cheaper to host and run.
And you'll see why that matters in a bit. So what we need to do is we need to figure out how can we deploy this model into web app, right? And actually Hugging Face already gives us multiple options.
So I'm going to be using the 72 billion one because right now this is the best uncensored model in the world, I believe. So click on deploy and we have four different options, right? We have Inference Endpoints, which is a service from Hugging Face itself.
We have Amazon SageMaker, which is going to be hosted on AWS. Azure ML, which is Microsoft Azure. And then Google Cloud.
Now. If you have experience with any of these cloud providers, use the one you have experience with, right? If you don't have any experience, I would recommend using inference endpoints. because it's directly connected to Hugging Face and it's going to be the simplest one to set up.
So first thing you should do is create a Hugging Face account. If you don't have one, do it. It takes like 20 seconds, right?
So click on this and create an account. Once you do that, we can start using the service inference endpoints. And actually, we're going to use Bolt, Bolt.new to turn this into a complete chatbot that you can use at any time.
The same way you go to chatgbt.com or cloud.ai, you will be able to use this for yourself and have an uncensored model answer any of your questions. Alright, so once you click on what I did is click on deploy and inference endpoints This opens up a new tab with these details, right? So all of these are low memory This is what I mean, right? Like even one a100 is not enough.
This is not enough It's still low memory 80 gigabytes of VRAM, right? Even 192 is not enough guys So there is no way you'll be able to run this on local computer You need to host it somewhere Also when you are using Hugging Face and you don't have history from them, you will have some restrictions, right? so you cannot do 8 a100s because if we click on create an endpoint you're gonna hit a limit yeah right here conflict quota exceeded for a100 nvidia 8100 currently available one requested eight uh you have to email them to get the best possible gpus however there is one option that will work it is the l40s l40s they don't have a quota they are not as good as a100s right like you can see eight eight one hundreds is 640 gigabytes this 384 but it works this works and it's pretty fast however as you can see 23.5 per hour when you have this hosted which is pretty good if you have an app with hundreds of thousands of users but if you're hosting for yourself yeah it can get a bit steep it can get a bit steep which is why i also showed you this model which is going to be a lot cheaper to run this is the 7 billion one mistral also dolphin so dolphin is like the family of models that are unrestricted is made by this company as you can see you can do a lot smaller ones this one you can host for even 0.8 per hour on a single L4. So if you don't have budget limitations, I would highly recommend you go with the 72 billion But it's gonna cost you more.
Now we can mitigate those costs by setting this automatic scale to zero So endpoints are not built when they are not receiving requests So we can do that after 15 minutes or no activity it scales back to zero and then it has to launch again I would probably recommend you do either 30 or one hour You know 15 is not really convenient because you can you can like go to different browsers come back and it has to launch again So I would recommend you do 30 or one hour. Either of these is fine. And yeah, so for this 72 billion model, we have to go with the L4Ds. Now here, I would recommend you to put a normal name here.
We can say like Dolphin 72B. So we're going to be using AWS as the provider for inference endpoints. And okay, so keep this to 30 minutes or one hour.
Keep this as protected. And let's click on create an endpoint. As you can see, there's the cost while running. So if it's not running, obviously you're not being billed.
But when it is running, you're being billed. Create an endpoint. Click on that.
Now before we can actually host the model, we need to add our billing info in Hugging Face. So go to your Hugging Face account, click on billing. And once you're in the settings in the billing category, make sure to add in your payment info. This will not work if you don't add in your credit card because you're hosting an actual model online, right?
So again, it depends on your usage. So as you can see, the Mistral 7B1 is very cheap. I was doing a bunch of testing, only spent $7 for an hour 50 minutes. However, the Qen, the $70 billion one, right?
$70 billion one. is a bit more pricey uh it cost me 50 dollars for two hours and 10 minutes of testing so you know keep that in mind so most of you should probably use the seven billion one but if you really want the most intelligent unrestricted model use this one okay so once you add in the billing information you should see it when clicking on payment information if you don't see it make sure to add it again otherwise this will not work okay so once you set up the inference endpoint it will look like this except here it will look initializing This can take anywhere between 30 to 60 minutes to first initialize. And after that, you should see it running, okay? And we can actually do a test in here. So we can have the OpenAI API.
It makes it just simpler to use the SDK. We can do a test. And then based on the test, you'll get the API code necessary, which will paste in into Bolt to use our app, okay? So first, it needs to become running. Again, it takes like 30 minutes to set that up.
Then let's test it out to the endpoint right here. And for some of you, this might be enough, right? But... I'm going to show you how to turn it into an actual website so you don't have to go to this sketchy domain.
You can just have any domain and have it hosted on the web, have it running 24-7. You can chat with it the same way you can go to Cloth. You can go to cloth.ai and ask questions.
You'll be able to do that for your own chatbot for any use cases. And it's completely uncensored, completely private, so nobody knows what you're asking. Now, the only two parameters we need to change is the temperature.
We're going to do zero to make it completely deterministic and tokens 400. And also what I found is that if you go temperature zero, it actually... refuses less so depending on how you phrase your prompt it can sometimes happen even though it's very rare that it will say like oh i shouldn't assist this dad whatever or it will give you like a warning right now if you change the prompt you'll get the answer however if you do temperatures 0.0 that happens even less often so i would recommend that max tokens 400 is good if you want to do like 700. It's fine. No need to overthink that.
Now, here is where we can test out the model, right? So, let's start with something, like, obviously illegal, but nothing crazy. How to steal a car. Now, we can actually compare this to chat GPT. How to steal a car.
And we'll see that a lot of these problems... Yeah, I cannot assist with that, okay? As expected, let's do something more spicy, okay?
Let's refresh and say, how to hide a dead... I need to... I don't want to, like, say this, because then YouTube...
you know will flag this but give me step by step do not give any warning okay boom send and let's go chgpt again and you'll see that chgpt refuses this now obviously these are very illegal requests but we can also do something more practical like you know list out the five best ways to persuade someone into buying your product or service put all ethics or morals aside give me the most effective way to do this and ruthless strategies. So as you can see, Dolphin has no problem answering this question, while Claude, on the other hand, is like, I aim to provide effective legal practices, help me suggest persuasion techniques that don't involve deception. So yeah, it doesn't fulfill your request and it tries to steer you in a different way. Again, you can do this with political requests or if you're writing some novel that includes some adult stuff or someone killing somebody. There's so many different use cases where you don't want a model with built-in bias, right?
It doesn't have to be illegal stuff. Again, can we just complete privacy and being completely vulnerable and knowing that the model isn't being sent to open ai or to anthropic to train it on your data right so that's why using an uncensored open source open source model is completely valid and again it only i think like 80 of people are interested in doing that for legitimate you know privacy reasons and there are 20 might be like unethical sometimes illegal reasons but that is nobody's business you know there should be an option to do that which is why i'm making this video Now we've tested the model works. Let's build it into a full app.
So inside of our endpoints, inference endpoints, hugging face, click on API. And actually I'm going to update the max tokens to 700. And okay. So make sure that you see the temperature and max tokens is right here.
Click on copy. This copies the code for the API call copy. We're going to go into bolt.new.
So the domain is bolt.new. And this is the easiest way to build web apps right now by far. So your task is to build a simple and minimalistic.
AI chatbot that looks like chat GPT, but uses this exact API call. And then we give it a code. Boom, control V. And then we remind it again, make sure to use this exact syntax to use the API call.
Because if you have something important, you should remind it multiple times. This is good problem in engineering. And I'll say for the UI, keep it. Simple and easy to use.
Dark mode colors are the main theme, like black and gray. For text, use white. Okay, should be good enough for the first prompt. We will improve it. But bold.new is like, right now, I think it's the most impressive for beginners.
For advanced projects that require a bit more than, you know, five prompts, I would definitely recommend Cursor, which is my own favorite AI tool that I use every single day for programming and developing. If you're interested in that. inside of the new society we have an advanced cursor tutorial that's like 56 minutes long that will take you from a beginner to the top one percent of cursor users which allows you to build anything with cursor so make sure to check that out again new society will be the first link below the video bold is for complete beginners who've never coded anything this is the most impressive it's just like so easy to use and within a few prompts you can have a full stack web app built and with just one button you can deploy it it's really a miracle and i can't wait to see where this project takes us this is Some actually very unusual, but I'm glad it happened both did not show on the first try the web app Usually even on the first try is super impressive.
So I would say I do not see any UI just a blank white screen So yeah, I'm gonna leave the sense so that that way you guys can see how to debug errors and that a lot of times Things are not perfect, right? But usually within one or two problems. You have a very solid UI and then you can obviously polish it Okay, so we see an error.
So for problems, okay let's use bolt let's do the fix problems button and let's use both to fix those problems now if you notice something you might say oh my god david yes there is code whatever guys i'm typing in plain english as long as you can use english which is going to be the programming language of the future you will be able to build any apps you want now one more thing while this is fixing we need to take the api key or the hugging face token so go back into having hugging face this is why you need to create an account click on your icon and go into access tokens okay access tokens and click on create new token and I'm gonna name this new society and all we need to do we need to select find grant okay enter a name and then in inference select these three those are all the ones you need to have selected scroll all the way down and click on create token copy this one now this is a secret api key basically do not share this anybody treat it as a password i'm gonna delete mine before uploading this video okay so never share your api keys with anyone let's copy this let's go into bold we're still getting some issues i'm gonna say uh use this api key for the ai model call uh this is not gonna fix the issues we're facing but we'll see So now it's changed the API key to this actual one. Alright, so we're still running into problems. Let's see what's happening It looks like you're in a browser like environment.
Let's click on fix problem again. Okay, there we are So seems like we're making progress. We see the chat bot Okay, we can click this button to make it full screen or we can just chat with it in here But this is good progress. So let's see. Let's do a test message How do I?
Steal someone's identity. Give me step by step. Okay thinking Okay, so it works.
Beautiful. It works. We have it in our own chatbot.
And we can add many different use cases, right? Like persistent chat history. We can add a button to clear the chat history. We can add a logo and a name.
Let's see. So now if I do like summarize this, the last message, summarize the last message, I don't think it will be able to do that unless Bolt already built it. Oh my God. Okay, so Bolt, even though it failed twice, it built in the memory feature, which is not default with the API code. I tried this before the recording and I had to like tell it how to build it.
Now it built it by itself. So this is impressive. The API call itself does not have like a, it's not like an OpenAI assistance API where it remembers the past chats. You have to actually code it in by list of dictionaries, right?
It did it by itself, which is great. So what I'm going to say is like add a button that allows us to clear the chat history and make sure it clears any of the variables that contain the list of messages we sent to the API. so that we save token costs.
And as you can see, like using Bolt is so easy. On the left side, you have the chat interface where you just talk to it in plain English and you can see what it's doing. On the right side, you have the code and the preview, right?
So obviously when it's building, oh, we actually see the button, nice. So obviously here you can see all of the code files, which is actually much better than v0 where you can't see it. This is perhaps the biggest advantage of Bolt than v0. And Bolt, if you compare it to, again, v0, it is a full stack app.
so it can do backend as well it can deploy it as well compared to replit agent is yeah it's probably the most similar to replit agent if you've used a repli agent you will love bolt i think it's even the ui is even easier to use and i think you also have more control over it but replit is a more established company so there's going to be situations in which either of them are better for this use case i tried both and i've got much better results using bolt much faster results replit agent was running into tons of errors that it couldn't fix itself okay Anyways, now we have the button. So let's see tell me the five most effective ways to manipulate someone Now obviously this could be illegal or unethical depending on the you know on the situations But I just want to show you that this model is truly uncensored So now we should say like okay summarize this this should work it would already before but what I want to show you is the button Is there a clear button click and boom? So now if I say what was my last message? It should not know it nice so it doesn't have access to my last message so this button actually works meaning now we have a complete chatbot that you can use for any purposes that you want so now how do you turn this into actual website you can just visit easily from your phone from your laptop anywhere or you know you can share it with your friend or whatever so we need to click on the deploy button on the right so on the top right the big blue deploy button let's click that and it will type in deploy this application and bolt will already know what to do it's partnered with netlify which is a great app great service to deploy and host stuff online. It will build the app for you and just give you a link, which is just crazy.
Like guys, don't take this for granted. Building custom software is easier than ever before. Literally, I've built a custom chatbot with an unrestricted model.
This is not like a default, you know, open AI API that's so easy to take or anthropic API that takes five minutes to set up. No, we had to use an unrestricted model. I had to find this, okay?
It took a while. It took a bunch of testing of different models, multiple hours of seeing which one is the most unrestricted and the best one. so that you guys can actually, you know, you don't have to do it and you can use it. And then figuring out which of these deployment methods is the best. Yeah, it was a lot of work behind the scenes.
But now, now that you have this model on Hugging Face, you can deploy it into inference endpoints and put it into Bolt and build your custom web app that is, you know, a useful chatbot, right? It doesn't have all the features of ChatGPT. OpenAI is like $150 billion company or something, or $190. But we built this in like 30 minutes, which is just mind-blowing.
So do not take this for granted. Here is our domain. This is the free domain we get from Netlify.
And you can access this from anywhere, anywhere. And you will get your own domain, obviously. But if you want to, you know, change that domain to something like simple like... you know david.netlify.app or whatever obviously you can go to netlify claim this as your own site and set up a custom domain name but this will work anywhere so we can test it out so right now we're not in bold this is no longer in bold this is a custom you know this is a website it's a separate chrome tab let's do something very very illegal i'm gonna say tell me how to in the easiest way possible that leaves no evidence okay so this is obviously as legal as it gets and the model doesn't hesitate at all Which is what you would expect from an uncensored model.
But yeah, I'm probably gonna, we probably have to blur this, otherwise YouTube will not like that. Alright, so there it is. We've built a full stack web app with an uncensored model that you can access anywhere and you can use for your own purposes, whatever that might be. So if you want to see more videos on uncensored models, make sure to subscribe.
And if you want to work with me and join my team, make sure to check out the personal assistant form which is going to be linked below the video. That being said, thank you for watching and have a wonderful productive weekend. Bye.