Transcript for:
Local AI GPU Options for 2025

there's never been a better time to get into running AI locally obviously massive models like Claude and Chat GBT get a ton of attention but in 2025 with advancements from companies like Mistl DeepSeek and a lot of awesome tooling that actually happens in the background there's actually never been a better time to look into running these models locally and in certain cases it's even cheaper i made a video about a year ago titled the best Nvidia GPUs to buy for local AI and this is my update to that video a lot has changed uh pricing has obviously changed with tariffs but I want to give you guys a pretty good sense of where I think things stand in 2025 how much capability has really grown in the space and also articulate how close the performance gap between closed source models and open source models has come along with the accessibility to hardware that's just kind of crazy at this point so if you're looking to buy GPUs to run AI locally are you curious where to get started this video is for you welcome to AI Flux let's get into it so this is a build guide that was really popular about a year ago and effectively it's people just recommending you should buy an RTX3090 and if you watch this video until the end you might be surprised what I recommend you should buy one or multiples of it's been really common to see a lot of builds with used hardware using GPUs like the Nvidia K80 or kind of enterprise GPUs and that's great but it's not the focus of this video the focus of this video are ideally kind of singular cards you can buy that are relatively new accessible on eBay or something you could find on Amazon that are a great starting point for running AI locally whether that's large language model for writing whether it's a local model you want to use with Ader or Klein for local vibe coding or something just to make images like Flux AI from Black Flores Labs or some of the really cool local tiny video generation models that are now available or you just want to mess around with this stuff that's kind of what this video is about i'm going to start on kind of the lower end and move into the higher end obviously with capability growing as we go there and that's kind of what we're going to do here so above all something that applies to all these builds uh is two attributes one is the GPU itself so how fast the GPU is in terms of just doing computations nvidia will call this AI tops in a lot of their specifications and that's one attribute the next attribute which I would argue is about two to three times more important is the amount of VRAM so VRAMm is just memory that's connected to the GPU and it heavily dictates what you can actually do with it the trade-offs get really really complex because there are times when of course you can mix GPUs you can use older GPUs that have more VRAM but at the end of the day even if you can fit a larger model in or you have more options for the models you can use or maybe more simple software tooling at the end of the day they will only run at the speed of the GPU you have or basically at the speed of the slowest GPU in your mixture of GPUs in your computer and for this video we're not going to focus on that the focus here is assuming you're using most likely Linux and you want a single or maybe two GPUs to get going so one thing I want to say is if you're a developer and you're curious about doing this one of the best options in my opinion is going to a site like Fast AI see my discount code below and just renting someone's PC remotely that already has the GPUs you think you want to buy because then you can recreate all of the tooling you in theory would want to use and you can see if it's fast enough you can see if it's stable enough if you don't want to do that you just want to go ahead and buy something go right ahead watch the rest of this video but uh I've used fasti for that for quite some time so the first GPU I want to talk about is the Nvidia RTX 4070 Ti distinctly from the 4070 because this GPU has 16 gig of VRAM which for me I think is kind of a minimum to have an okay experience using Locally AI and I say okay instead of great because 16 gigs is still really just not that much but if you're okay using some special software tooling that might offload a little bit of this to your uh CPU or system RAM 16 gigs is a great starting point and the 4000 series is incredibly capable and generally is fast enough that it's still fun to use and the experience is still pretty good especially for local image generation for uh small text models and a little bit in between with VLMs a great example of this is using tools like uh Xlama or Umbrella that do some clever speedups with quantiz models in this case someone's talking about using Llama 370B with a in quantization achieving what is still pretty slow compared to a web model but getting up to around 10 tokens per second using it which is actually pretty good these GPUs are relatively available they're kind of expensive now because of tariffs but on eBay you can still find these in very reasonable range sometimes in like the $600 range and they're still pretty new so the ones you're finding used have probably just been used lightly for gaming another benefit is this is still a relatively new GPU so the driver support is quite good they're also good for gaming so if you do gaming on the side or you kind of like to switch between Linux and Windows this will still be a great buy for things other than AI which if you buy an RTX 3060 or something like an Nvidia P40 you know the 3060 in theory can render video but the P40 is completely useless for anything other than AI which is another kind of important thing to consider here if you want to dump a bunch of money into doing this first GPU is the 4070 Ti great choice but a little bit limited on VRAM my next recommendation is the 5070 Ti from Nvidia now obviously a lot of you will say well this has the same amount of VRAM and one third aspect I should have mentioned is that VRAM speed is also important here so with the 570 Ti you're getting the latest DDR7 VRAMm that Nvidia uses in all of their 5000 series GPUs you're getting 16 gigs of it and what's really cool is in theory you're getting roughly the performance of an RTX 4090 and you're getting significantly more performance than a 5070 which in my opinion is the most important thing here so obviously the RTX 4090 has a little bit more RAM but the 5070 Ti is a great option if you're buying a single GPU the one caveat with the 5070 I would say is if you're planning on getting a second at any point in time or you're buying two of them at the same time uh do not do that you're going to be really disappointed and the performance gains you get for the money you're spending are not really that great but as a singular GPU that's also for gaming and is pretty incredible for local AI especially if you're just trying to get into something on your machine you can run Linux with this is a great option and what's really cool is these GPUs are pretty plentif so you can find them online on Newegg or even at Best Buy and on the used market they're really pretty cheap for the performance you're getting and what's great is this is a new GPU so you're getting Nvidia's latest technology you're probably being able to buy this with a warranty and if you decide you don't like it and you want to sell it there's actually a pretty good chance there'll be a market for it unlike some other GPUs that are in a similar performance class but maybe are a bit older the next tier is kind of a tie and you'll understand what I mean in just a bit so even though this GPU is two generations behind the latest release from Nvidia and is nearly 5 years old I still strongly believe that the best all-around option for anyone who wants to get into local AI is the Nvidia RTX 3090 it's just incredible so you get 24 gigs of VRAM you can still find these on eBay for basically like $600 to $800 i highly recommend getting the EVGA RTX3090 Further Wind 3 Ultras or the Nvidia Founders Edition cards and if you really want to be fancy you can get the GeForce blower style cards but those are overpriced for dumb reasons the thing is it's really really hard to beat the value of this GPU if you go anywhere online generally the best recommendation still to this day will be an RTX 3090 24 gig or two of them and what's cool is you can get two of them roughly for the same price of a 570Ti if you're comfortable buying these on eBay and spending that kind of money on used hardware i've never had any issue with it you kind of have to know what you're doing but in my opinion these are a fantastic option and maybe just get one to start with right get one to start with if you really like it if this becomes a massive hobby for you then buy a second or buy three more it's really really hard to go wrong with these GPUs and people really like them and they've been proven to be very durable as well so for the time being while tariffs are completely wrecking the entire PC industry the 3090 in my opinion reigns true as the value king you can still truly link these together as well so especially if you get two of them you can still use NVLink to effectively have a holistic GPU architecture with 48 gigs of VRAM that's not the fastest sure it's It's actually about the speed of like an RTX 4080 but for fine-tuning for training small models for doing inference the 24 gig Nvidia RTX3090 in my opinion is still one of the best options if not the best option when people ask me about this it is my first recommendation now if you don't want to buy used hardware my next recommendation is and it's closer to the higher end it's a little more expensive a little harder to find is the Nvidia RTX 5080 and I wouldn't necessarily recommend buying the current one if you can buy it at MSRP definitely definitely don't spend $1,800 on this if you're buying it for AI if you like gaming and you want to get this kind of as a cool thing you can also do AI with probably a better option but um the thing I'm waiting for and this has been teased a number of times and with the stuff going on in China who knows if we'll ever see this but MSI has teased a version of the 5080 with 24 gigs of memory now if this becomes a thing this will become my next best recommendation for kind of a entry sort of protier level for a local AI GPU the 24 gig realm is a really great sweet spot um a lot of the models and tooling target that as kind of a starting point on the lower end for them you'll have a great experience you'll have a lot of options and pairing the 5080 with 24 gigs of memory is an incredible incredible option and on the high end although I have not been able to buy one yet I do still think the 5090 if and when it becomes more plentiful is one of the best choices you can make single GPUs are just easier to do things with it has the latest tooling of course it's the best Nvidia GPU for AI right now no one can buy it for less than $5,000 but in the meantime definitely the 3090 is one of the better options and in my next video I want to get into some modified options that in theory provide significantly better value with just as much quality so I will tease what this next video may be about it might also be about a modified version of the 4890 that also has more RAM that we've actually seen used quite a bit in the wild now so I'm curious how much are you guys willing to spend on local AI what do you want to do with it are you doing tech stuff are you doing generative AI image and video stuff are you doing VLM stuff are you building a custom security system for your house um let me know in the comments below let me know if you disagree with my picks for the best AI GPUs for 2025 running local AI i'm always really curious to see what you guys have to say as always I hope you learned something in this video and I will see you in the next one