Overview of Flux Image Generator Insights

finally we have an image generator that can actually generate accurate hands and fingers plus it can also generate text accurately not only that but it can follow tricky prompts really well plus it can even generate these mediocre lowquality selfie images that normal people take so I mean with this new image generator it would be almost impossible to tell AI photos apart from real photos let's do a quick test here I'm using this same prompt but with this new image generator called flux and also two other state-of-the-art image generators stable diffusion 3 and sdxl so these three images are based on the same prompt now between left right and Center I'm not going to tell you which model is which you let me know which one looks the best to you so here's the first prompt this image shows three young African children two boys and a girl standing on a dirt ground they are all smiling and making a P sign with their fingers the girl is wearing a green and yellow patterned dress and the boys are wearing casual clothes the boy on the left is holding the girl on top of her while the boy in the middle is holding her up in the air with both hands all right so let me know which image you think is best left right and Center by the way for each model I'm keeping their positions left right and center the same I'm not shuffling any of their positions so here's the next one the image shows three three children sitting in the trunk of a red car the car is parked on a dirt road with trees and bushes they are sitting on a blanket and are eating slices of watermelon in their hands the boy on the left is wearing a blue striped shirt and shorts the girl in the middle is wearing blue shorts and a white T-shirt the girl on the right is wearing an orange shirt so really tricky prompt there's a lot of details here let me know which one looks best to you all right next one The Prompt is a photo of a blond-haired woman with a pistol in each hand and an open mouth facing the camera so let me know which one looks best to you and then next we have the famous prompt of a woman lying on grass so it's pretty obvious here which one looks the best all right here's another comparison let me know which one do you think looks the best so the prompt here is a young woman playing a bass guitar on a stage she is wearing a black dress and black boots she has long blonde hair styled in loose waves and pay attention to her fingers and the bass right a base should have four strings only so which image do you think looks best in this case all right here's another comparison so here the prompt is a young woman standing on a street with her back to the camera she's wearing a white T-shirt black skirt knee high socks brown shoes she has a large blue backpack on her back with a brown strap and a small teddy bear on it so let me know which one looks best to you and finally here we have a young woman standing on a sidewalk with her arms raised in the air she's wearing a gray T-shirt with a graphic of a dog on it a white skirt and white sneaker she has a yellow backpack on her back and is smiling at the camera next to her there is a small white dog possibly a Pomeranian sitting on the sidewalk so let me know which one looks best to you all right right next The Prompt is a photograph of a blackhaired woman in a white dress with blood stains sitting on a red couch to the right of a chiming clock holding a rose in her left hand with three skulls at her feet a very tricky prompt let me know which one looks best to you all right next we have anime so here is a anime girl with massive fluffy fenic ears and a fluffy tail blonde messy long hair blue eyes wearing a made outfit long black dress etc etc eating a slice of apple pie in the kitchen of an old dark Victorian Mansion let me know which one looks the best and not only the quality but which one actually follows the prompt the most all right next one this is a young woman with long blonde hair and bunny ears on her head kneeling in front of an open refrigerator she is wearing a white tank top and black shorts and is holding a black black phone in her hand the fridge is filled with various fruits and vegetables there's a potted plant on the right side of the image the woman appears to be taking a selfie all right so let's reveal the answer so the left side is the new image generation model called flux and then the middle one is stable diffusion 3 this is the latest version of stable diffusion and in theory it should be the best one and then on the right we have stable diff Fusion XL now I think we can objectively say that the left one flux looks the best in most cases so let's go over the prompts really quickly the first one the prompt is three African children so you can see flux is the only one with three kids and then making a p sign flux is the only one with decent hands and fingers actually making a p sign the one in the middle sd3 they're kind of making a p sign but the fingers are still messed up so in terms of hands and fingers flux is the clear winner here and then the girl is wearing a green and yellow pattern dress which is what we're seeing anyways next one so both flux and sd3 are pretty good both of them show that there are three children sitting in the trunk of a red car however in terms of quality you can see again flux just crushed it this photo looks really good the faces are very detailed they're each holding a slice of watermelon the toes actually look real whereas for stable diffusion 3 the faces are kind of blurry the toes are not really accurate and the girl in the middle is missing a watermelon so again for this one I would give the point to flux all right and then here I mean both flux and sd3 got the prompt correct but again it seems like flux is just better quality especially if you're going for this cinematic realistic style flux is really good and then this one woman lying on the grass st3 is Infamous for not getting this correct and generating quite grotesque images and even sdxl sometimes it works sometimes it doesn't like you can see here this woman has an extra hand so it's just not great but as you can see flux nailed this plus her hands and fingers are actually accurate all right so here I would say again flux is the clear winner here in terms of hands and fingers and the base guitar flux is the only one that was able to generate a bass guitar with four strings and the strings are actually straight and the Frets look very realistic as well even the drums in the background look more realistic than the other two examples and then here I would say stable to Fusion 3 followed the prompt a bit better so we did mention that she's wearing brown shoes We're not seeing that for flux but again for flux the image quality is just way better than the other two I mean it's not even close and then here again I would say flux is the clear winner here it followed the prompt rout really well it was even able to generate a graphic of a dog on her shirt plus it's the only one who could generate a white Pomeranian but yeah you can see that flux not only followed the prompt really well but in terms of image quality it's also a lot better than stable diffusion and then here again if you just compare the quality then the flux image looks the best none of them got three skulls at her feet so the flux one had four skulls sd3 had two skulls but they both generated the woman correctly all right now in terms of anime in terms of image quality I would give it to sdxl there are just so many good anime models built for sdxl it's been around for a while so you can expect that some sdxl models do a lot better than the newest flux or stable diffusion 3 however in terms of following the prompt I would have to give it to again flux it's the only one who was able to actually generate the girl eating a slice of apple pie the other two generators did not generate a slice of apple pie all right next one again I think the clear winner here in terms of quality and following the prompt is flux so this woman is kneeling down she does have bunny ears and she is holding a black phone she does have a white tank top and black shorts sd3 looks pretty good but you can see her left hand is missing and her right foot looks really weird so again I think the point goes to flux it can also generate at lowquality cell phone photos like this or this or this so you can see it generates the iPhone pretty accurately plus it Nails the hands and the fingers plus this looks like a legit lowquality selfie shot from an actual human you might be used to AI photos being really polished and everything just looks so perfect so they were quite easy to tell apart but right now flux can generate such average looking photos I mean it's in insanely hard to tell this apart from real photos it's also great at generating text so here's one example here's another example of text and here's another example and note that she has five fingers her hands are perfect speaking of fingers finally we have an AI image generator that actually gets hands and fingers correct so here are some examples it doesn't matter the position of the hand it could be like pouring tea it could be playing against guitar but the fingers do look correct to some extent plus note that the strings on this guitar and the Frets on this guitar are actually straight none of the other image generators could get this consistently here's another example of hands tying a shoe and you can see it nailed this one as well here's another example and another example so I mean this is the only AI image generator that can actually get hands and fingers correct at least most of the time and you know even mid Journey which was the the leading closed Source image generator it still could not generate hands and fingers really well so I mean now with flux you can forget about mid Journey forget about stable diffusion flux is by far the best image generator out there right now so in this video we're going to go over how you can use it including online methods as well as how to install and run it locally if you have a good enough GPU plus I'll go over the architecture and details about the model so really quickly flux is a new image generator that was announced recently and they were developed by black forest Labs now this is quite a new startup and it seems like a lot of the team members were actually from stability AI in fact they claimed that they were the original creators of stable diffusion XL and stable video diffusion plus many other well-known tools now for flux they've actually released three models so I'll go over each of them the first one is called Schnell and this is the fastest model so think of it as like a turbo version for a stable diffusion model this is completely free and open source however this is the worst quality out of the three models this is the smallest one it's the fastest but it's the worst quality so it's kind of a water down version I'll show you a comparison in a second and then next we have the dev model this is slower but it's much better quality this is also free and open source so you can download it and run it locally however this this is only for non-commercial use and if you do want to use it commercially well you need to get in touch with them to discuss further and then finally they have their pro version and this is slightly better than the dev version and this is the best quality out of the three models however this is paid and closed Source you cannot download the weights and run this locally but this is the best quality out there now I'll go over the architecture and benchmarks and more technical details about flux but that's kind of boring so I'll leave that till the end of the video for now let's dive in and actually try this out thanks to two magic for sponsoring this video are you looking to Kickstart your YouTube channel or take your existing channel to the next level look no further than tube magic an AI powered tool that helps you succeed on YouTube their built-in keyword research function lets you sort keywords by search volume competition and an exclusive Magic score which automatically finds the best keywords for your videos imagine never running out of content ideas again with the video ideas tool you can simply drop in any channel link and Tube magic will generate video ideas related to that channel plus it even writes scripts and outlines for you so you're already 80% of the way there instead of starting from scratch you can also upload an unlisted video link and it will generate the best title description and tags for you by the way this is created by YouTuber Matt par who runs the popular make money Matt Channel and over 12 other channels so tube magic is backed by years of experience and success many big YouTubers are already using it to streamline their content so if you want to grow fast try out tube magic for free via the link in the description below so there are a few places online where you can use it for free one option is this space on replicate which I'll link to in the description below so you can see it's pretty simple to use you just enter into prompt flux doesn't even have negative prompts so it's just one positive prompt and then you can change the aspect ratio to whatever you want and then guidance this is how well it follows your prompt so if you drag this all the way to zero it doesn't follow your prompt as well and it would spit out some random image if you drag this all the way to the right it might follow your prompt two literally and give you some unwanted results so it's best to keep this at 3.5 which seems to be their default and then seed is just the starting point of your image so usually you would leave this to blank which sets it to a random value but if you do set the seed to the same number and you keep all the settings the same then in theory it should generate the same image that you got before and then output format is just well what file type do you want webp jpeg or PNG and then output quality is just well how much compression do you want on your image so let's just leave everything at the default all right let's test this out for the prompt I'm going to put a man wearing glasses writing in his diary and then click run so note that flux is quite bulky and it takes a much longer time to run compared to stable diffusion however in most cases you're going to get a higher quality result compared to stable diffusion so here indeed we have a man he's wearing glasses he's writing in his diary and the pen actually looks straight and his fingers and his hands actually look real so this is really impressive let me test out an even trickier prompt so here we have a zebra with rainbow Stripes playing a grand piano on a mountain top the piano is made of ice and in the background the Northern Lights illuminate the sky so let's click run and see if it can generate that oh my God this is just so incredible there's just one minor flaw which is it has three arms here but this is the only generator so far that can actually get a zebra with rainbow Stripes to actually play a piano made of ice with Northern Lights in the background previously I did a review on another new image generator called oraflow and compared that with stable diffusion and it's not as good as this I mean this looks amazing let's try another example so here the prompt is a girl in a Victorian era dress holding a sign that says flux is King she stands in a garden filled with flowers and butterflies let's see what that gives us all right well it kind of gave us this image in a Disney Pixar style but I did not specify for this to be a realistic photo so whatever and you can see she does have accurate fingers and hands and she isn't a Victorian era dress she is holding a sign that says flux is King the letters are very accurate plus there are flowers and butterflies again just nailed the prompt everything is correct plus the image quality is actually really good super impressive now if you run out of daily credits on this replicate space you can also use the hugging face Space by black forest Labs themselves so there's actually two spaces one is for Schell this is the faster but lower quality model or you could use another space which uses flux Dev which is the slower but better quality model I'll link to both of these in the description as well now since we have both of them open this is actually a good time to show you the difference between Chell and Dev so as I mentioned Schnell is kind of a watered down version so the only reason to use it is if your GPU isn't good enough so you could only run this faster but smaller model but in most cases Dev is the way to go the quality is a lot better than Schnell anyways let's paste both prompts in here so we have a beautiful woman with long brown hair doing yoga at home so I'm going to paste this in the dev space plus I'm going to paste the same thing in the Schnell space face and then let's click run I'm going to set these side by side and you can see Schnell is already done so here we have a woman doing yoga at home and her hands and fingers are perfect she is indeed doing yoga previously I did a comparison with another new tool called aaow plus stable diffusion 3 plus stable diffusion XL and here are the generations so you can see they are awful compared to what flux is able to generate all right so now our flux Dev image is also done and you can see just the colors and the details of the image it's a lot better than Schnell I find that Schnell is kind of oversaturated and the contrast is too much at times whereas this gives you a more cinematic feel all right let's try a different prompt so here the prompt is a ballerina girl with butterfly wings dancing on a lily pad in a Serene Pond anime style so I'm going to click run for both and note how quickly Schnell runs compared to Dev it runs way faster wow and we are already done so here we have a ballerina girl with butterfly wings dancing on a lily pad again it just follows the prompt really well there are some flaws so her face isn't really detailed I'm sure if you bump up the resolution it would be a lot better plus we have some flower floating in midair so that's not really accurate and then here's the same prompt but with Dev so you can see that again in terms of image quality Dev is just better looking than Schnell schel seems to be oversaturated plus the contrast is too high all right so those are some ways you can use it for free online now if you do want to install this and run it locally I'm going to show you that right now note that there are a lot of steps and a lot of things to download plus you do need at least 12 GB of vram on your GPU to run this well as well as 32 GB of RAM on on your computer also this is using comfy UI so you do need to have this installed if you're new to comfy UI I'm going to post a beginner-friendly tutorial on how to install and use it probably by next week so stay tuned for that so let's go over how you can install this locally I'm going to link to this guide by comfy Anonymous in the description below so the first thing to do is to download these safe tensor files so let's click on this link and first we need to download this clip l. saf tensors so let me download that and in our comy UI folder we need to download this in models SL clip so let's save this in here and then now depending on how much vram you have on your graphics card you can download this larger version or the smaller version I'm going to go with this one so it doesn't kill my GPU by the way I have a Dell Precision 5690 laptop which has an Nvidia RTX 5,000 Ada with 16 GB of vram shout out to Dell and Nvidia for sponsoring this but so with 16 GB I would still prefer to use this smaller version so it doesn't kill my GPU if you have like a 4090 with 24 GB of vram then this fp16 would be a better option but anyways I'm going to download this and also put it in my models SL clip folder all right next step is we need to download this vae and it should go in your model /ve folder so if you click on this link it's actually broken so instead you should go to the Black Forest Labs hugging face page which I'll link to in the description below and then when you scroll down you should see these two models the Schnell which is the faster but lower quality model or the dev model now again just so I don't kill my GPU I'm just going to use the Schnell version but if you have a good enough GPU you can go with this Dev version so I'm going to click on this and then in this files and versions tab you should see this AE doaf tensor so this is your vae file so let's go ahead and download this and this goes in models SLV so let's save this in here and then finally we need to download the fuel. safe denters file or the dev safe dentes file if you're using that model and we would download that in our models SL unet folder so let me click save all right so I'm going to open up my comfy UI folder and make sure all the files are there so in this models folder in this clip folder you should have this clip l. safe tensors file as well as this t5x XL fp8 or fp6 Sav ders file and then let me go back to the models folder and then in unit you should have flux Shel or dev. sft or do safe tensors and then going back to models in this vae folder you should also have this AE file here all right and then afterwards we are good to go so let's run comfy UI all right now the first thing to do is actually go in manager and then click update comfy UI otherwise your current version of comfy might not have support for flux so let's click on this all right so our comfy UI has been successfully updated so let's click close And then close now going back to this page which is by comfy Anonymous again I'll link that in the description below there are two images that you can save this first one is using flux Dev so if you've downloaded flux Dev save this image because I'm using Schell I'm going to right click and save this image and you can save it wherever you want and you might be wondering well why are we downloading this image and that's because the awesome thing about comfy UI is if you drag and drop an image that was created in comy UI and given that they haven't deleted any metadata then it would actually pull the workflow that they used to create the image so I'm going to drag and drop the downloaded image onto com UI and you can see voila we have this workflow that is already made for us it automatically gives us this pre-existing workflow we don't have to make this from scratch so just a few things we need to change here which is this one because I did not download this fp16 one I downloaded this fp8 one so I'm going to select this and then everything else is the same and then if you're familiar with comfy UI everything should be quite self-explanatory so here we have the latent image you specify the width and height here this is basically the seed here is the K sampler and the scheduler and we plug all of this in to this sampler custom Advanced so previously in comi for stable diffusion you would have these three right the seed the K sampler and the scheduler all in one node which is called K sampler but here they've kind of broken it up into three different nodes and then now they're using this sampler custom Advanced node it's a bit Messier but it is what it is all right so let's enter in a prompt and test this out I'm going to write a woman holding a sign that says rip table diffusion and then click Q prompt so you can see it's taking a while to load the diffusion model all right so right now it's going through through the sampler all right and here we go here we have a woman holding a sign that says rip stable diffusion so that's how you would install and run flux locally on your computer all right finally I want to talk about how this actually works and the technical details behind flux because it's actually quite interesting so first of all here they compared some Benchmark metrics between the three flux models and other image generators such as stable diffusion 3 and mid Journey version 6 and also Dolly 3 so interestingly first of all stable diffusion 3 Ultra is actually ahead of mid Journey version 6 but anyways they found that the lowest quality model Schnell is already slightly better than mid Journey 6 at least according to this Benchmark metric and then both flux Pro and flux Dev just beat all these other models they are currently the best quality image generators out there and you can see for this pro model again this is the paid and closed Source version it's actually better than the dev model by quite a large margin and this chart sums up well the creative abilities of these three models compared to the cost so flux Schnell is less creative and again this is the lowest quality but it's also a lot cheaper and a lot less resource intensive to run and then flux Dev the cost will be higher you're going to need more vram for this but it's more creative and then flux Pro is even more creative but it requires the most resources to run now here's how the architecture works so it's actually based on a hybrid architecture of multimodal parallel defusion Transformer blocks so this is key here so the diffusion model is like the traditional stable diffusion model and then the Transformer model is good at understanding natural language so a defusion Transformer model is kind of like a hybrid model it's kind of like if chat GPT and stable diffusion had a baby so they've improved over previous state-of-the-art diffusion models by building on Flow matching which is a method for training these generative models plus they've Incorporated rotary positional embeddings and this helps flux understand your prompt better especially if it has a lot of complex elements and they've also Incorporated parallel attention layers and all of these extra features just give it a much better understanding ing of composition and prompt following and how to create a coherent good quality image and so you can see here the pro and the dev models just destroy other image generators like mid Journey which is closed source and paid by the way and also stable diffusion 3 Ultra which is the best version of stable diffusion so far and flux surpasses them in visual quality prompt following size and aspect variability typography and output diversity so there there you go that sums it up for the technical specs of flux so if you've gotten a chance to play around with flux let me know what you think of it so far what were you impressed by what did you like and not like and for those of you who are using mid Journey or stable diffusion would you consider actually switching to flux now that you've seen the results do you think that flux is indeed the best image generator out there so far let me know in the comments below as always I will be on the lookout for the top AI news and tools to share with you so if you enjoyed this video remember to like share subscribe and stay tuned for more content also there's just so much happening in the world of AI every week I can't possibly cover everything on my YouTube channel so to really stay up to date with all that's going on in AI be sure to subscribe to my free Weekly Newsletter the link to that will be in the description below thanks for watching and I'll see you in the next one

finally we have an image generator that can actually generate accurate hands and fingers plus it can also generate text accurately not only that but it can follow tricky prompts really well plus it can even generate these mediocre lowquality selfie images that normal people take so I mean with this new image generator it would be almost impossible to tell AI photos apart from real photos let&#39;s do a quick test here I&#39;m using this same prompt but with this new image generator called flux and also two other state-of-the-art image generators stable diffusion 3 and sdxl so these three images are based on the same prompt now between left right and Center I&#39;m not going to tell you which model is which you let me know which one looks the best to you so here&#39;s the first prompt this image shows three young African children two boys and a girl standing on a dirt ground they are all smiling and making a P sign with their fingers the girl is wearing a green and yellow patterned dress and the boys are wearing casual clothes the boy on the left is holding the girl on top of her while the boy in the middle is holding her up in the air with both hands all right so let me know which image you think is best left right and Center by the way for each model I&#39;m keeping their positions left right and center the same I&#39;m not shuffling any of their positions so here&#39;s the next one the image shows three three children sitting in the trunk of a red car the car is parked on a dirt road with trees and bushes they are sitting on a blanket and are eating slices of watermelon in their hands the boy on the left is wearing a blue striped shirt and shorts the girl in the middle is wearing blue shorts and a white T-shirt the girl on the right is wearing an orange shirt so really tricky prompt there&#39;s a lot of details here let me know which one looks best to you all right next one The Prompt is a photo of a blond-haired woman with a pistol in each hand and an open mouth facing the camera so let me know which one looks best to you and then next we have the famous prompt of a woman lying on grass so it&#39;s pretty obvious here which one looks the best all right here&#39;s another comparison let me know which one do you think looks the best so the prompt here is a young woman playing a bass guitar on a stage she is wearing a black dress and black boots she has long blonde hair styled in loose waves and pay attention to her fingers and the bass right a base should have four strings only so which image do you think looks best in this case all right here&#39;s another comparison so here the prompt is a young woman standing on a street with her back to the camera she&#39;s wearing a white T-shirt black skirt knee high socks brown shoes she has a large blue backpack on her back with a brown strap and a small teddy bear on it so let me know which one looks best to you and finally here we have a young woman standing on a sidewalk with her arms raised in the air she&#39;s wearing a gray T-shirt with a graphic of a dog on it a white skirt and white sneaker she has a yellow backpack on her back and is smiling at the camera next to her there is a small white dog possibly a Pomeranian sitting on the sidewalk so let me know which one looks best to you all right right next The Prompt is a photograph of a blackhaired woman in a white dress with blood stains sitting on a red couch to the right of a chiming clock holding a rose in her left hand with three skulls at her feet a very tricky prompt let me know which one looks best to you all right next we have anime so here is a anime girl with massive fluffy fenic ears and a fluffy tail blonde messy long hair blue eyes wearing a made outfit long black dress etc etc eating a slice of apple pie in the kitchen of an old dark Victorian Mansion let me know which one looks the best and not only the quality but which one actually follows the prompt the most all right next one this is a young woman with long blonde hair and bunny ears on her head kneeling in front of an open refrigerator she is wearing a white tank top and black shorts and is holding a black black phone in her hand the fridge is filled with various fruits and vegetables there&#39;s a potted plant on the right side of the image the woman appears to be taking a selfie all right so let&#39;s reveal the answer so the left side is the new image generation model called flux and then the middle one is stable diffusion 3 this is the latest version of stable diffusion and in theory it should be the best one and then on the right we have stable diff Fusion XL now I think we can objectively say that the left one flux looks the best in most cases so let&#39;s go over the prompts really quickly the first one the prompt is three African children so you can see flux is the only one with three kids and then making a p sign flux is the only one with decent hands and fingers actually making a p sign the one in the middle sd3 they&#39;re kind of making a p sign but the fingers are still messed up so in terms of hands and fingers flux is the clear winner here and then the girl is wearing a green and yellow pattern dress which is what we&#39;re seeing anyways next one so both flux and sd3 are pretty good both of them show that there are three children sitting in the trunk of a red car however in terms of quality you can see again flux just crushed it this photo looks really good the faces are very detailed they&#39;re each holding a slice of watermelon the toes actually look real whereas for stable diffusion 3 the faces are kind of blurry the toes are not really accurate and the girl in the middle is missing a watermelon so again for this one I would give the point to flux all right and then here I mean both flux and sd3 got the prompt correct but again it seems like flux is just better quality especially if you&#39;re going for this cinematic realistic style flux is really good and then this one woman lying on the grass st3 is Infamous for not getting this correct and generating quite grotesque images and even sdxl sometimes it works sometimes it doesn&#39;t like you can see here this woman has an extra hand so it&#39;s just not great but as you can see flux nailed this plus her hands and fingers are actually accurate all right so here I would say again flux is the clear winner here in terms of hands and fingers and the base guitar flux is the only one that was able to generate a bass guitar with four strings and the strings are actually straight and the Frets look very realistic as well even the drums in the background look more realistic than the other two examples and then here I would say stable to Fusion 3 followed the prompt a bit better so we did mention that she&#39;s wearing brown shoes We&#39;re not seeing that for flux but again for flux the image quality is just way better than the other two I mean it&#39;s not even close and then here again I would say flux is the clear winner here it followed the prompt rout really well it was even able to generate a graphic of a dog on her shirt plus it&#39;s the only one who could generate a white Pomeranian but yeah you can see that flux not only followed the prompt really well but in terms of image quality it&#39;s also a lot better than stable diffusion and then here again if you just compare the quality then the flux image looks the best none of them got three skulls at her feet so the flux one had four skulls sd3 had two skulls but they both generated the woman correctly all right now in terms of anime in terms of image quality I would give it to sdxl there are just so many good anime models built for sdxl it&#39;s been around for a while so you can expect that some sdxl models do a lot better than the newest flux or stable diffusion 3 however in terms of following the prompt I would have to give it to again flux it&#39;s the only one who was able to actually generate the girl eating a slice of apple pie the other two generators did not generate a slice of apple pie all right next one again I think the clear winner here in terms of quality and following the prompt is flux so this woman is kneeling down she does have bunny ears and she is holding a black phone she does have a white tank top and black shorts sd3 looks pretty good but you can see her left hand is missing and her right foot looks really weird so again I think the point goes to flux it can also generate at lowquality cell phone photos like this or this or this so you can see it generates the iPhone pretty accurately plus it Nails the hands and the fingers plus this looks like a legit lowquality selfie shot from an actual human you might be used to AI photos being really polished and everything just looks so perfect so they were quite easy to tell apart but right now flux can generate such average looking photos I mean it&#39;s in insanely hard to tell this apart from real photos it&#39;s also great at generating text so here&#39;s one example here&#39;s another example of text and here&#39;s another example and note that she has five fingers her hands are perfect speaking of fingers finally we have an AI image generator that actually gets hands and fingers correct so here are some examples it doesn&#39;t matter the position of the hand it could be like pouring tea it could be playing against guitar but the fingers do look correct to some extent plus note that the strings on this guitar and the Frets on this guitar are actually straight none of the other image generators could get this consistently here&#39;s another example of hands tying a shoe and you can see it nailed this one as well here&#39;s another example and another example so I mean this is the only AI image generator that can actually get hands and fingers correct at least most of the time and you know even mid Journey which was the the leading closed Source image generator it still could not generate hands and fingers really well so I mean now with flux you can forget about mid Journey forget about stable diffusion flux is by far the best image generator out there right now so in this video we&#39;re going to go over how you can use it including online methods as well as how to install and run it locally if you have a good enough GPU plus I&#39;ll go over the architecture and details about the model so really quickly flux is a new image generator that was announced recently and they were developed by black forest Labs now this is quite a new startup and it seems like a lot of the team members were actually from stability AI in fact they claimed that they were the original creators of stable diffusion XL and stable video diffusion plus many other well-known tools now for flux they&#39;ve actually released three models so I&#39;ll go over each of them the first one is called Schnell and this is the fastest model so think of it as like a turbo version for a stable diffusion model this is completely free and open source however this is the worst quality out of the three models this is the smallest one it&#39;s the fastest but it&#39;s the worst quality so it&#39;s kind of a water down version I&#39;ll show you a comparison in a second and then next we have the dev model this is slower but it&#39;s much better quality this is also free and open source so you can download it and run it locally however this this is only for non-commercial use and if you do want to use it commercially well you need to get in touch with them to discuss further and then finally they have their pro version and this is slightly better than the dev version and this is the best quality out of the three models however this is paid and closed Source you cannot download the weights and run this locally but this is the best quality out there now I&#39;ll go over the architecture and benchmarks and more technical details about flux but that&#39;s kind of boring so I&#39;ll leave that till the end of the video for now let&#39;s dive in and actually try this out thanks to two magic for sponsoring this video are you looking to Kickstart your YouTube channel or take your existing channel to the next level look no further than tube magic an AI powered tool that helps you succeed on YouTube their built-in keyword research function lets you sort keywords by search volume competition and an exclusive Magic score which automatically finds the best keywords for your videos imagine never running out of content ideas again with the video ideas tool you can simply drop in any channel link and Tube magic will generate video ideas related to that channel plus it even writes scripts and outlines for you so you&#39;re already 80% of the way there instead of starting from scratch you can also upload an unlisted video link and it will generate the best title description and tags for you by the way this is created by YouTuber Matt par who runs the popular make money Matt Channel and over 12 other channels so tube magic is backed by years of experience and success many big YouTubers are already using it to streamline their content so if you want to grow fast try out tube magic for free via the link in the description below so there are a few places online where you can use it for free one option is this space on replicate which I&#39;ll link to in the description below so you can see it&#39;s pretty simple to use you just enter into prompt flux doesn&#39;t even have negative prompts so it&#39;s just one positive prompt and then you can change the aspect ratio to whatever you want and then guidance this is how well it follows your prompt so if you drag this all the way to zero it doesn&#39;t follow your prompt as well and it would spit out some random image if you drag this all the way to the right it might follow your prompt two literally and give you some unwanted results so it&#39;s best to keep this at 3.5 which seems to be their default and then seed is just the starting point of your image so usually you would leave this to blank which sets it to a random value but if you do set the seed to the same number and you keep all the settings the same then in theory it should generate the same image that you got before and then output format is just well what file type do you want webp jpeg or PNG and then output quality is just well how much compression do you want on your image so let&#39;s just leave everything at the default all right let&#39;s test this out for the prompt I&#39;m going to put a man wearing glasses writing in his diary and then click run so note that flux is quite bulky and it takes a much longer time to run compared to stable diffusion however in most cases you&#39;re going to get a higher quality result compared to stable diffusion so here indeed we have a man he&#39;s wearing glasses he&#39;s writing in his diary and the pen actually looks straight and his fingers and his hands actually look real so this is really impressive let me test out an even trickier prompt so here we have a zebra with rainbow Stripes playing a grand piano on a mountain top the piano is made of ice and in the background the Northern Lights illuminate the sky so let&#39;s click run and see if it can generate that oh my God this is just so incredible there&#39;s just one minor flaw which is it has three arms here but this is the only generator so far that can actually get a zebra with rainbow Stripes to actually play a piano made of ice with Northern Lights in the background previously I did a review on another new image generator called oraflow and compared that with stable diffusion and it&#39;s not as good as this I mean this looks amazing let&#39;s try another example so here the prompt is a girl in a Victorian era dress holding a sign that says flux is King she stands in a garden filled with flowers and butterflies let&#39;s see what that gives us all right well it kind of gave us this image in a Disney Pixar style but I did not specify for this to be a realistic photo so whatever and you can see she does have accurate fingers and hands and she isn&#39;t a Victorian era dress she is holding a sign that says flux is King the letters are very accurate plus there are flowers and butterflies again just nailed the prompt everything is correct plus the image quality is actually really good super impressive now if you run out of daily credits on this replicate space you can also use the hugging face Space by black forest Labs themselves so there&#39;s actually two spaces one is for Schell this is the faster but lower quality model or you could use another space which uses flux Dev which is the slower but better quality model I&#39;ll link to both of these in the description as well now since we have both of them open this is actually a good time to show you the difference between Chell and Dev so as I mentioned Schnell is kind of a watered down version so the only reason to use it is if your GPU isn&#39;t good enough so you could only run this faster but smaller model but in most cases Dev is the way to go the quality is a lot better than Schnell anyways let&#39;s paste both prompts in here so we have a beautiful woman with long brown hair doing yoga at home so I&#39;m going to paste this in the dev space plus I&#39;m going to paste the same thing in the Schnell space face and then let&#39;s click run I&#39;m going to set these side by side and you can see Schnell is already done so here we have a woman doing yoga at home and her hands and fingers are perfect she is indeed doing yoga previously I did a comparison with another new tool called aaow plus stable diffusion 3 plus stable diffusion XL and here are the generations so you can see they are awful compared to what flux is able to generate all right so now our flux Dev image is also done and you can see just the colors and the details of the image it&#39;s a lot better than Schnell I find that Schnell is kind of oversaturated and the contrast is too much at times whereas this gives you a more cinematic feel all right let&#39;s try a different prompt so here the prompt is a ballerina girl with butterfly wings dancing on a lily pad in a Serene Pond anime style so I&#39;m going to click run for both and note how quickly Schnell runs compared to Dev it runs way faster wow and we are already done so here we have a ballerina girl with butterfly wings dancing on a lily pad again it just follows the prompt really well there are some flaws so her face isn&#39;t really detailed I&#39;m sure if you bump up the resolution it would be a lot better plus we have some flower floating in midair so that&#39;s not really accurate and then here&#39;s the same prompt but with Dev so you can see that again in terms of image quality Dev is just better looking than Schnell schel seems to be oversaturated plus the contrast is too high all right so those are some ways you can use it for free online now if you do want to install this and run it locally I&#39;m going to show you that right now note that there are a lot of steps and a lot of things to download plus you do need at least 12 GB of vram on your GPU to run this well as well as 32 GB of RAM on on your computer also this is using comfy UI so you do need to have this installed if you&#39;re new to comfy UI I&#39;m going to post a beginner-friendly tutorial on how to install and use it probably by next week so stay tuned for that so let&#39;s go over how you can install this locally I&#39;m going to link to this guide by comfy Anonymous in the description below so the first thing to do is to download these safe tensor files so let&#39;s click on this link and first we need to download this clip l. saf tensors so let me download that and in our comy UI folder we need to download this in models SL clip so let&#39;s save this in here and then now depending on how much vram you have on your graphics card you can download this larger version or the smaller version I&#39;m going to go with this one so it doesn&#39;t kill my GPU by the way I have a Dell Precision 5690 laptop which has an Nvidia RTX 5,000 Ada with 16 GB of vram shout out to Dell and Nvidia for sponsoring this but so with 16 GB I would still prefer to use this smaller version so it doesn&#39;t kill my GPU if you have like a 4090 with 24 GB of vram then this fp16 would be a better option but anyways I&#39;m going to download this and also put it in my models SL clip folder all right next step is we need to download this vae and it should go in your model /ve folder so if you click on this link it&#39;s actually broken so instead you should go to the Black Forest Labs hugging face page which I&#39;ll link to in the description below and then when you scroll down you should see these two models the Schnell which is the faster but lower quality model or the dev model now again just so I don&#39;t kill my GPU I&#39;m just going to use the Schnell version but if you have a good enough GPU you can go with this Dev version so I&#39;m going to click on this and then in this files and versions tab you should see this AE doaf tensor so this is your vae file so let&#39;s go ahead and download this and this goes in models SLV so let&#39;s save this in here and then finally we need to download the fuel. safe denters file or the dev safe dentes file if you&#39;re using that model and we would download that in our models SL unet folder so let me click save all right so I&#39;m going to open up my comfy UI folder and make sure all the files are there so in this models folder in this clip folder you should have this clip l. safe tensors file as well as this t5x XL fp8 or fp6 Sav ders file and then let me go back to the models folder and then in unit you should have flux Shel or dev. sft or do safe tensors and then going back to models in this vae folder you should also have this AE file here all right and then afterwards we are good to go so let&#39;s run comfy UI all right now the first thing to do is actually go in manager and then click update comfy UI otherwise your current version of comfy might not have support for flux so let&#39;s click on this all right so our comfy UI has been successfully updated so let&#39;s click close And then close now going back to this page which is by comfy Anonymous again I&#39;ll link that in the description below there are two images that you can save this first one is using flux Dev so if you&#39;ve downloaded flux Dev save this image because I&#39;m using Schell I&#39;m going to right click and save this image and you can save it wherever you want and you might be wondering well why are we downloading this image and that&#39;s because the awesome thing about comfy UI is if you drag and drop an image that was created in comy UI and given that they haven&#39;t deleted any metadata then it would actually pull the workflow that they used to create the image so I&#39;m going to drag and drop the downloaded image onto com UI and you can see voila we have this workflow that is already made for us it automatically gives us this pre-existing workflow we don&#39;t have to make this from scratch so just a few things we need to change here which is this one because I did not download this fp16 one I downloaded this fp8 one so I&#39;m going to select this and then everything else is the same and then if you&#39;re familiar with comfy UI everything should be quite self-explanatory so here we have the latent image you specify the width and height here this is basically the seed here is the K sampler and the scheduler and we plug all of this in to this sampler custom Advanced so previously in comi for stable diffusion you would have these three right the seed the K sampler and the scheduler all in one node which is called K sampler but here they&#39;ve kind of broken it up into three different nodes and then now they&#39;re using this sampler custom Advanced node it&#39;s a bit Messier but it is what it is all right so let&#39;s enter in a prompt and test this out I&#39;m going to write a woman holding a sign that says rip table diffusion and then click Q prompt so you can see it&#39;s taking a while to load the diffusion model all right so right now it&#39;s going through through the sampler all right and here we go here we have a woman holding a sign that says rip stable diffusion so that&#39;s how you would install and run flux locally on your computer all right finally I want to talk about how this actually works and the technical details behind flux because it&#39;s actually quite interesting so first of all here they compared some Benchmark metrics between the three flux models and other image generators such as stable diffusion 3 and mid Journey version 6 and also Dolly 3 so interestingly first of all stable diffusion 3 Ultra is actually ahead of mid Journey version 6 but anyways they found that the lowest quality model Schnell is already slightly better than mid Journey 6 at least according to this Benchmark metric and then both flux Pro and flux Dev just beat all these other models they are currently the best quality image generators out there and you can see for this pro model again this is the paid and closed Source version it&#39;s actually better than the dev model by quite a large margin and this chart sums up well the creative abilities of these three models compared to the cost so flux Schnell is less creative and again this is the lowest quality but it&#39;s also a lot cheaper and a lot less resource intensive to run and then flux Dev the cost will be higher you&#39;re going to need more vram for this but it&#39;s more creative and then flux Pro is even more creative but it requires the most resources to run now here&#39;s how the architecture works so it&#39;s actually based on a hybrid architecture of multimodal parallel defusion Transformer blocks so this is key here so the diffusion model is like the traditional stable diffusion model and then the Transformer model is good at understanding natural language so a defusion Transformer model is kind of like a hybrid model it&#39;s kind of like if chat GPT and stable diffusion had a baby so they&#39;ve improved over previous state-of-the-art diffusion models by building on Flow matching which is a method for training these generative models plus they&#39;ve Incorporated rotary positional embeddings and this helps flux understand your prompt better especially if it has a lot of complex elements and they&#39;ve also Incorporated parallel attention layers and all of these extra features just give it a much better understanding ing of composition and prompt following and how to create a coherent good quality image and so you can see here the pro and the dev models just destroy other image generators like mid Journey which is closed source and paid by the way and also stable diffusion 3 Ultra which is the best version of stable diffusion so far and flux surpasses them in visual quality prompt following size and aspect variability typography and output diversity so there there you go that sums it up for the technical specs of flux so if you&#39;ve gotten a chance to play around with flux let me know what you think of it so far what were you impressed by what did you like and not like and for those of you who are using mid Journey or stable diffusion would you consider actually switching to flux now that you&#39;ve seen the results do you think that flux is indeed the best image generator out there so far let me know in the comments below as always I will be on the lookout for the top AI news and tools to share with you so if you enjoyed this video remember to like share subscribe and stay tuned for more content also there&#39;s just so much happening in the world of AI every week I can&#39;t possibly cover everything on my YouTube channel so to really stay up to date with all that&#39;s going on in AI be sure to subscribe to my free Weekly Newsletter the link to that will be in the description below thanks for watching and I&#39;ll see you in the next one

Transcript for:Overview of Flux Image Generator Insights

Transcript for:
Overview of Flux Image Generator Insights