ChatGPT Model Comparison

most people are using the wrong chat GPT model and they don't even realize it open up the menu and you're hit with GPT40 03 03 Pro 04 Mini 04 Mini High then even more models it's confusing but picking the right one can completely change your results faster answers better outputs and hidden features most users never touch whether you're just chatting or wiring the API into an app let me show you which one to choose and when i'll make this simple because OpenAI didn't their naming conventions are just ridiculous so I renamed each model based on what it's best at and I had ChadBT generate custom visuals to make them easier to remember i will show the cost speed and best workflow for each one we'll start with the core models most people use every day then move into the ones you should switch to when needed i'll be including power user tips for cost API and agents first up is GPT40 the generalist it's fast and it's the default for most users and honestly it should be 40 is great across the widest variety of use cases like a Swiss Army knife need a quick summary of a blog post 40 nails it how about brainstorm 10 catchy YouTube titles about explaining the different chat models nailed it again i'll probably use some version of one of these trying to describe what's happening in a photo 40 actually walked me through my entire PC build just by taking photos along the way it's conversational quick and surprisingly good at creative or open-ended tasks this is my go-to assistant for probably 40 to 50% of my use if you're building something like a chatbot for customers 40's fast replies make it a perfect model for that but don't rely on it for accounting numbers or critical code i'll use a simple consistent test case across these first few models to demonstrate explain the pros and cons of nuclear energy versus solar in detail with citations 40 will give you a decent overview really fast but it's surface level it glosses over nuance it may get details wrong and once you push it the cracks really start to show it will sound super confident even when it's just hallucinating and making stuff up for casual use your daily driver for easy to medium tasks 40 is the move but when you need depth or precision time to switch let's run that same question explain the pros and cons of nuclear energy versus solar in detail with citations but this time we're asking 03 the professor if you're on the $200 per month plus plan you will also see 03 Pro here now which was just released i'll cover that one a bit later since it will be used a lot less but I will send this prompt with 03 right away you'll notice something different it starts thinking and you can actually watch its reasoning play out it begins by considering the best way to approach the question and figures out the key points it needs to research further next it searches for sources to support each one of those the process feels almost human at one point it goes "Oh this is a solid source that's a keeper." Pretty interesting to watch its thought process sometimes it discovers where it needs more information and finds new sources for that area finding sources for cost then capacity then carbon footprint then it refineses its pros and cons finds gaps looks up new sources and builds a full answer in total it thought for 1 minute and 41 seconds and here's a cool part i can open up this box and see every step it took so this is where 03 shines so this is real multi-step reasoning not just an output and when you compare the results the highle structure might look similar to 40 but 03 goes miles deeper backing up all its claims with actual sources it ran its own calculations when needed and it caught a lot of nuance that 40 completely missed it's not just repeating general consensus it's evaluating the question and to test it further I asked 03 to critique 40's answer it flagged outdated cost numbers over simplified conclusions it had a flatout false claim there was no mention of water usage and there's missing context all over the place and that's the danger with 40 it sounds smart especially on topics you're not familiar with but if you ask it about a subject you know deeply you realize how often it misses nuance relies on outdated info or just make stuff up that happens much less with 03 not zero of course but much less its reasoning quality is definitely on another level and this matters obviously for like math or research heavy prompts like combinatorial mathematics 10 distinct people are seated around a circular table a and B must sit next to each other c and D must not how many distinct seedings are possible considering rotations equivalent or philosophy compare and critique the strongest arguments for and against pansychism but it also helps with more like everyday logic problems like legal questions or business decisions or even something like a workout plan with a few added constraints that can benefit from a reasoning model too now sure this output will take longer initially but I often save time overall with 40 I get fast drafts but end up in three to four follow-ups to get it right 03 just nails it up front i find that to be the case a lot even for everyday questions not super simple like I'll use 40 when my son asks me "What types of food can we feed a snail he found but when I want to know all the bugs we can catch in Utah and keep in a terrarium that can all live harmoniously together while requiring minimal upkeep yes that's a real thing we're doing." And I save time by letting 03 think that through before the first answer then I just switch to 40 for the faster follow-ups that tag team workflow is something I use a lot now let's ask that same question one more time explain the pros and cons of nuclear energy with solar in detail with citations this time we'll open up the tools box then select deep research mode aka the scholar i'll send that and then it always comes back with a list of clarifying questions since it's about to go deep just answer those and here's what happens it disappears for a few minutes usually 5 to 10 depending on how complex the question is while it's gone it's scouring the internet studies articles public data it analyzes what it finds breaks down arguments calculates trade-offs now this works similarly to 03 because it uses 03 as its core reasoner but deep research goes further it does also pull in faster models for simple tasks like it might call 04 mini to scrape tables then all the data gets handed to 03 for synthesis and what you get back isn't just an answer it's a mini literature review you'll get a clearly structured breakdown multiple perspectives direct quotes and links to real sources a conclusion that actually weighs trade-offs based on the evidence deep research is slow and capped but when you want thorough well-reasoned answers using extensive real world sources this is your tool it's perfect for writing researchbacked blog posts prepping presentations or interviews academic work digging up real recent data from the web just think of this as your personal research assistant this is not for fast answers this is for big questions that need receipts now we're diving deep into all the different chatbt models but when you're using each model whether you're writing coding or building your own tools understanding how to structure prompts is the most important next step to getting better results i have a free resource provided by HubSpot linked below that teaches exactly that it's called Advanced Chat GPT Prompt Engineering: From Basic to Expert in 7 Days this guide isn't just a list of prompts it's a full framework to transform how you think and work with AI one of my favorite sections is on the roses framework this shows you how to engineer prompts using role objective scenario expected solution and steps it's incredibly useful for structured outputs there's also a great deep dive into modular prompt systems basically how to build prompt components you can use across projects and that is something I've been experimenting with a lot i use these types of techniques all the time and if you've ever wanted to go from I hope ChatGpt gets it right to I built a prompt system that delivers every time this resource is exactly what you need download it for free using the link in the description and thanks to HubSpot for sponsoring this video and providing resources like this to people who watch this channel all right same question three very different results it's worth recapping these because they're by far the most useful across the most use cases especially for casual users gpt40 the generalist gave us a fast conversational overview great for quick surface level understanding but it doesn't go deep and it often hallucinates or reasons poorly not something you'd ever cite 03 the professor slowed down structured the argument and built a clear logical comparison backed by real analysis deep research the scholar went full academic it pulled citations sourced real links and wrote like a formal report and one key point 03 can search the web too especially if you ask it to the difference is in how far it goes 03 is often the better choice for reasoning because it's faster it's easier to iterate with deep research is for when I want to really understand one thing when I want chatbt to vanish for 10 minutes dig through the internet and bring me back the best thinking data and evidence available now let's talk about GPT 4.5 the wordsmith this one's kind of a wild card the 4.5 isn't the best at reasoning coding or research but it shines in one specific area tone if you want something that flows naturally something with rhythm personality or emotional weight this is the model the difference isn't huge but it's noticeable so let's try this prompt create a persuasive product description for a futuristic smart pen using vivid sensory language it's slower than 4 but still fast as you grip its smooth ergonomic body subtle vibrations gently confirm every input all right I said sensory not sensual but this part is pretty good whisper quiet it harmonizes with your workflow turning scribbled brainstorms into polished masterpieces the Luminina Pro isn't just a pen it's your bridge to the future feel your thoughts flow freely captured vividly and effortlessly and step boldly into a new era of creativity that's solid copy and that's where 45 especially shines marketing branding and ad writing but it works for other creative writing too so let's try something different describe a quiet morning in a war torn village but make it peaceful and nostalgic all right this looks solid a solitary bird trills softly from a gnarled olive tree its song threading tenderly through empty doorways and windows draped in ivy at the village square an ancient stone fountain murmurs steadily a comforting heartbeat in the silence a gentle breeze rustles through cloth shutters stirring curtains like ghostly hands reaching out to reclaim forgotten dreams that's great writing especially since I didn't really give it much direction and 40 has come a long way in creative writing too but 45 just feels smoother that said don't ask it to do math logic puzzles or fact-heavy research it's not the best at any of those use 4.5 when you want strong voice tone or emotion you're writing something persuasive descriptive or stylized where you care more about flow than precision otherwise you're better off with 40 or 03 just one last thing 45 is preview only and might disappear once 40 gets fully upgraded for tone the 40 updates from May actually closed much of the gap already so this one may not stick around but for now it's your go-to ghost rider before we jump into the next models a quick note on where this video came from it was inspired by a tweet from Andre Carpathy one of the top minds in AI he shared what he uses each chatbt model for and there was a lot of debate and disagreements with his takes in the comments and I didn't fully agree either so I did a few things to help research this i took his tweet i extracted every comment from underneath and had those takes counted and categorized i also had deep research analyze sources from all over the internet reddit GitHub blogs everything then it compared all the data from those experiences too the models I've covered so far were largely all agreed upon by everyone except maybe four or five but for the next few models where there was disagreement I ran fresh tests so I do think this list is very accurate now but if you've got a different take drop it in the comments i'd love to hear how you're using them so far we've covered the core players the generalist the professor and the scholar plus the wordsmith which is the biggest kind of random outlier but now we're moving into some specialists or what you switch to when you hit a wall gpt 4.1 the coder this is the model Carpathy listed as great for vibe coding and he's right 41 is excellent for coding and following detailed instructions but its real superpower is context window if you're using the chat GBT interface it's capped at 32,000 tokens same as 40 and 03 but through the API 4.1 and 4.1 mini unlock up to a million tokens of context that's massive for working with long transcripts legal documents or large code bases 4.1 via API is often the best tool if you're new to the API here's what that actually means prompting through the API bypasses OpenAI's default system prompt the part that adds personality safety filters and tone instead you define your own system prompt tailored to your task that gives you more control and fewer guardrails it's more raw flexible and programmable now there are different levels of how deep you want to go to use this on the beginner level there's tools like guey.ai it's great for uploading long documents and chatting with them and just super simple next is kind of the intermediate level platforms like NADN Make or Zapier those let you build automations or agent workflows using the API and I do have a full beginner tutorial on NADN if you want to start there i'd also put a repo prompt in the intermediate tier that's great if you're doing vibe coding or conversational dev work inside a real codebase in the advanced tier there's things like lang chain or the openai SDK in terminal give you full control if you're in that last category you probably didn't need this breakdown also worth noting API usage is built separately from your chatbt subscription so cost matters models like 03 Pro which we'll get to later are great but expensive the standard 03 just dropped 80% in price so now it's about the same price as 4.1 but the 1 million token memory via API makes it perfect for things like refactoring huge repos or reading entire legal bundles with strict instruction following in this video I'm just showing the chat GBT interface for simplicity but this using models like 41 through the API is where you unlock their full power here's an example of what 4.1 can do refactor this 800line JavaScript project to TypeScript with full type annotations and explain the key changes this is the kind of instruction heavy task it handles really well it's structured fast and clean perfect for dev work 4.1 Mini the intern is like a junior version of 4.1 same type of behavior just faster cheaper and a little rougher around the edges it still supports that million token context window through the API that makes it incredibly useful for working with long files transcripts legal docs CSVs entire code bases but the outputs won't be quite as polished or reliable as full 4.1 like think of it this way 4.1 is the senior developer precise thorough consistent 4.1 mini is the intern quick eager mostly gets it right but occasionally needs a second pass that said it absolutely can handle pretty complex prompts like that same one I just showed and the result might need a little more editing or clarification than if you ran it through the full 4.1 if you're using the API and cost matters which it often does 4.1 Mini is a fantastic budget option for coding workflows instruction heavy tasks long context use cases where accuracy isn't missionritical it's not going to reason like 03 or write like 4.5 but it's efficient capable and cheap if you're looking for speed and reliable reasoning 04 Mini is your go-to if you need extra accuracy for math or code 04 Mini High is the upgrade these two models are widely used by developers researchers and power users who need a balance of performance quota efficiency and task specific strengths they're not as publicized as GPT40 or 03 but they're quietly becoming favorites on the back end o4 Mini is the underrated workhorse its reasoning is surprisingly close to 03 and since 03 dropped in price it's now only about twice the cost of lighter models like 04 Mini but that difference still matters for a lot of workflows it's a great fallback when your 03 quota runs out especially for logic puzzles and STEM tasks using that same seating question from earlier 04 Mini also got the right answer but in 15 seconds as opposed to 1 minute and 5 seconds that 03 did and if I did this through the API it would have been cheaper so it's the perfect pick when you want something smarter than 40 but faster than 03 it's great for real-time chat bots or when you need snappy replies smarter than 40.0 Mini High is the mathematician the 04 Mini High is the same model as 04 Mini but with more compute per token its accuracy and cost are close to 03 so it's a good option to switch to if your 03 quotota runs out or pick it when you have stem workloads and lots of simultaneous API calls you'll breeze past 03's tighter rate limits prove the infinitude of primes using an analytic number theory approach how about compute IGEN values of a 4x4 matrix and interpret them for a Markov chain this one shines when you need stem performance without burning 03 or high volume tasks that still demand rigor o4 mini and 04 mini high aren't flashy but they quietly do a ton of work in real world pipelines the newest and most expensive model is 03 Pro the Oracle it's 03 with more compute per token and that extra reasoning comes with some big trade-offs the price is much higher but also it's extremely slow like 03 will give you quicker answers when the question is simple but 03 Pro will think extensively every single time like that same seating question took 19 minutes and 45 seconds to get the same answer 03 got to in 1 minute and 5 seconds and 04 Mini got to in 15 seconds and I've seen people posting ones like saying "Hi I'm Sam Alman it took 3 minutes and 54 seconds to respond or 14 minutes and 17 seconds to decide there's three Rs in Strawberry." That's why I think the best use for this model is questions nothing else can get right if 03 fails escalate the question to 03 Pro or things where the stakes are very high you know maybe for formal proofs or auditing tricky financial spreadsheets or a final QA pass in an agent pipeline or maybe you want to give it just a ton of context and have it create an analysis and plan for your business the situations it makes sense but for most people 99% of the time it won't be worth the additional time and cost so here is the bottom line every model has a purpose but most people stick with just one if you're using 40 for everything you're leaving accuracy creativity and powerful features on the table use the generalist as your daily driver use the professor when you need logic use the scholar when you need citations use the wordsmith for tone use the specialists like 41 and the '04 series when speed cost or context become a problem and when all else fails consult the Oracle treat the model dropdown like a toolbox not a default and especially if you're building agents or longer workflows real optimization comes from mixing models strategically across tasks tools and tokens if you want to go way more in depth on learning AI on Futureedia we have over 20 comprehensive courses on how to incorporate AI into your life and career to get ahead and save time the one I just finished up is about making movies with AI but we have specialists come in to teach the different types of courses another one that just came out is all about coding and there's a ton of other courses in there you can get a 7-day free trial using the link in the description or check out this video that will take you from zero knowledge to building your first AI agent

Transcript for:ChatGPT Model Comparison

Transcript for:
ChatGPT Model Comparison