Transcript for:
Exploring AI's Role in Filmmaking

Basically, I mean, if you've used Midjourney, if you've used DALI, it's the same thing. Text-to-image prompt and get some good images out of it. But the best thing is, if you have anything near a decent gaming computer, you can use it locally, offline, for free on your own computer. If you want to pose, you can take a picture of yourself with facial expression and your hands the way you want. Insert that and your character will... results have that pose that you want to give it. AI is the tool. A tool is only valuable in the hands of a person who knows how to use it. So try to be that person. Just before we get into the episode, would you do me a huge favor and hit the subscribe button? Wherever you're listening to or watching this podcast, subscribing really helps the show to grow. And now let's get into the episode. Hi, Mike. So I'm so excited about the work you're doing. Not only on YouTube, but most recently at Storybook Studios, you're using a lot of AI in filmmaking. And I'm so excited to see you're like a warrior for open source, stable diffusion. And hey, in this episode, in the short time we're together, we might even get into control net. Shall we start with stable diffusion? Because I don't think we've spoken about it yet on this podcast. How are you using it? Oh, fantastic. I get to introduce people to stable diffusion. Well, I mean, I'm from Munich, so of course I have to be into stable diffusion because our good people here at the local university came up with latent diffusion, which then the company Stability AI funded and turned into this wonderful product, or shall I say, free thing that we all get to use if we want to. It's basically, I mean, if you've used Midjourney, if you've used DALI, it's the same thing. You can text to image prompt and get some good images out of it. But the best thing is... If you have anything near a decent gaming computer, you can use it locally, offline, for free on your own computer. And also, you know, make it better. You can train it on your own images. You can add all sorts of add-ons that this community of tens of thousands of people worldwide have created for it. And that's what gives you the control that you might be missing from the other commercial products. Which are excellent as well, of course. I have to say that. We use them as well. But... You know, as filmmakers, especially, we want to give that direction. We want to have it look exactly how we want and not just kind of, which is why I'm a big fan. Hang on a second, Albert. You said you can put it on your computer if you've got a decent gaming computer. So what would one need and how would one get started? And how techie do we need to get to get it working? Basically, what you need to start out is a computer, ideally with an NVIDIA GPU, although others do work with workarounds, NVIDIA is the way to go. Anything in the RTX series is fine. I started out on a 2070, so really not the best of the best at all and was very happy. And you do need to be comfortable with basic GitHub commands. You shouldn't be scared of a terminal window. It's going to be gone quickly, don't worry, but you do need to open it briefly to install all this good stuff. And then, you know, go on Reddit, go on the Stable Diffusion subreddit or on my YouTube channel and follow the tutorials available there. And, um... you can get started. But you can make sense of it. I'll make sure to link those videos you've created in the description to this show so people can go in and check them out. And if they want to dive deeper, like figure out how that works. So we're quite up to date because at the time this episode is releasing, it's quite recent and a lot has changed since I think Stable Diffusion 1.5. And now there's Stable Diffusion 3 mid or something like that, or middle version. So what version are you running right now as of sort of, you know? Nearly summertime 2024, what's the latest in Albert's arsenal right now? Yeah, we're using SDXL still because we need the resolution, first of all, right? We want to deliver full HD content at the very least, although we did make the first 4K trailer out there early this year with our own special workflow. The newer models, you know, every time they release something new, it takes a lot of getting used to and you have to first understand, is this actually better or just different in some way? So SDXL is what we're comfortable with, what we've built our own fine-tuned models on. So we're sticking with that. SD 1.5, you know, can be good for specific reasons, you know, just getting some details in there that XL might not be perfect at. But yeah, we're almost classic at this point. Yeah, if one can say that in AI at all. So take me a little bit further, because I've seen some of the clever stuff you've done on YouTube, such as, you know, regenerating someone's face to look better. or colorizing an old photo. That I don't think is in stable diffusion. That was sort of more photo manipulation. But let's stick with video for the moment. Say you generate something and it's not quite looking perfect. Are there any ways of redoing it, or do you just have to keep reprompting until you get what you want? Yeah, video is funny. Video is gambling with a lot of the methods out there, especially stable video. There's a video model as well that you can feed an image and it'll give you a couple seconds. If you're doing that... It can be great for creating trailers for movies and stuff. A lot of the really cool AI trailers we've seen are made that way. But if it doesn't look the way you want, you can just click again and hope, which of course is not enough if you're trying to make a movie or a series like we're doing right now. But if you have a little bit of knowledge of 3D animation, or honestly just picking up your phone and filming yourself doing the pose you want, or the little movement you want, There are ways in comfy UI, for example, to use stable diffusion to kind of give it what you mentioned at the beginning, control net input in addition to the prompt to really get that that motion control that everybody's looking for as well. So it is possible. It's just a different style at the moment. Let's call it animated. No, that's it. And I think that's one of the key things you've hit on. There is the consistency of characters right on the next generation. And especially like you say, if you're creating like short movies or whatever, you really. want to and eventually people will be creating full movies. You want to have that consistency. So, okay, you mentioned Comfy UI and also there's automatic 1111, I think. Are you using Comfy or the other or are there any differences between those two? Yeah, I guess the question is what your goal is, right? If you want some images quickly with just the text prompt and want kind of a mid-journey experience, I can totally recommend the Focus UI with three O's. Because it has an LLM running in the background as well to improve your prompt the way that Midjourney does. So if you're quick and dirty, you just want to visualize a trailer quickly, that's the way to go, especially if you're coming from Midjourney. Then one step up is the auto 1111 or 1111 or whatever that user is called who gifted us this thing. That's kind of the more advanced image editing tool, I would say, if you want to have all the little dials that you mentioned. And then one step above that, if you want to process thousands of images or want to get into video, that's comfy UI. That's scary. You get all these it looks like spaghetti if you open it the first time. It's crazy. So if you have the slightest bit of traditional, I don't know, nuke effects, if you're coming from the visual effects industry. You'll be comfortable using that. I would recommend working your way up to that, getting comfortable slowly with stable diffusion before you start. And then it's a ton of fun and 100% customizable. And then there's something called ControlNet that you can plug in, I think, to stable diffusion. So for someone who's not experienced ControlNet, what is it going to do? How is it going to change the images or potentially even videos, I guess, that you may be creating? Oh, well, mostly it's great at getting me LinkedIn likes because people don't know about it yet, despite it being over a year old at this point. So this is the control that everybody's looking for all the time, and they're trying to get it with text prompts when, in fact, of course, with ControlNet, you can combine your classic text prompt with any sort of other image that you want to get an aspect of in your results. So, for example, if you have a photo of a room that has the same, like, the furniture in the places you want, the window in the place you want, you can send it to the control net depth model to get the same depth, literally, like the same camera angle roughly, or positions of objects. If you want a pose, you can take a picture of yourself with the facial expression and your hands the way you want and insert that and your character will result, have that pose that you want to give it. So control net is a whole array of tools. You just need to pick the right control net model to Take the aspect you want to convert from the image you give it, and it'll do that for you. My favorite personally is the scribble model, because I cannot draw, I cannot illustrate, but I can draw stick figures and rough sketches of what I want, and it'll just magic it into the result you're looking for. I have no clue how it works. It's amazing. And it's one guy at Stanford who built this, so thank you. It's almost like plugins for WordPress, where you get the basic... blogging platform but then you're like i want to start a podcast so i need a podcasting plug-in or i want to have like a carousel of images so you get that installed so it sounds like control net is kind of like that for stable diffusion and you can really fine tune it something i get asked a lot actually by creators is how to do the perfect face swap um i don't know if control net can do that is that something you'd be happy to dive into and maybe share some expertise on you Yeah, I mean, control net, there's something called IP adapter, which is part of control net, a kind of control net, which I think stands for image prompt adapter. I'm not entirely sure. And there is a face version of that. So if you want to just give it an image of a face you want to copy, it'll it'll put that into your result. That's one way to do a, it's a flexible way, because it doesn't just swap out the face, it generates the image from from the get go of that person, which of course, is what you actually want to do. If you want to swap it out at the end, there's something called Reactor, which is also one of those extensions, one of those plugins you can use in your workflows in any of the UIs. That also works pretty cool. I do have a caveat. I don't think that model is for commercial use. I think it's academic. I don't think anybody I've seen out there has cared. I seem to be the only one. So do with that what you will. Now, obviously, you're the creative director of AI media at Storybook Studios. Explain a little bit to me about your role there and how you would work on, I mean, I definitely recommend people go and watch some of the trailers that are on the website there. How would you go about creating something like that? First of all, where does the planning start before you even hit the AI and start prompting? How does that work? Yeah, I mean, the cool thing about Storybook is we're not just, we didn't just start at zero. We're a sister company, a company of a quite successful film production company here in Munich, that some Germans in your audience, if you're listening to this, might know from Matthias Schreikhofer movies, or if you've seen Army of Thieves on Netflix, the Zack Snyder production, that's also one of our sister company's productions, which means we kind of have the whole storytelling. We're born out of traditional filmmaking and know how to tell a good story, whether with AI or not. So that started from a classic idea. We had a script, we had a concept with a lot of imagery that we'd put together and decided to do the trailer for the Cirrus, which you may have seen. That's kind of our most well-known video at this point. So that classic edit. was the first step, you know, the story, the voiceover, what we actually want to tell. Then we filled that with still images that we generated in a mix of UIs and Midjourney as well is involved because it's, you know, very good at making cinematic images quickly. And then we start the real like AI animation stuff. So that's kind of the final step, which I would recommend to anybody know what you want to tell before opening any software at all, first of all. And then Really use AI only to its strengths and don't try to, you know, use it just because it's AI. Because that's how you get, you know, maybe great visuals, but an awful voiceover without emotions. For example, the actress in that video is a real human, because especially in January when we made this, there were no good text-to-speech tools. So we said, okay, no, we want to convey a good story, good emotions. So we go to a real person for that. And yeah, I mean, concretely. Try to develop a skill set outside of AI, you know, understand editing, understand picking music or possibly even composing music yourself, because that will help. You know, if you're already an artist, AI is the tool. We always call it a tool, right? But a tool is only valuable in the hands of a person who knows how to use it. So try to be that person first. Incredible. So let's move on. We've spoken a lot about stable diffusion, control net, touched a little bit on video. Did we want to go any further with video? So you mentioned, you know, generating. I mean, what generators in terms of AI video are you excited about right now? Oh, good question. I mean, obviously, Runway Gen 3 just dropped. It's text-to-video, so it's even more gambling than I'm used to. We haven't been able to use it professionally, to be perfectly honest, because we're waiting on the image-to-video. We want any clip to come out of an AI that needs to fit into our edit, so we need to have control over the style and just general look of it. and character consistency, like you mentioned. But Luma, of course, is great at that. I'm very excited about that. It likes inventing new characters, so it's always going to be kind of a co-director, but that can be fun. So I did a little test, if you want to look up on my LinkedIn, just, for example, I had a cowboy lady just looking off to the distance, and I prompted for her to turn around and walk away. The camera ended up panning to the left and just showing a man. who started talking. So I thought, okay, I'm going to use 11 labs to try to lip sync and have him say a sentence that makes sense and then do the other shot, the other angle, and it decided to make the woman, you know, get on her horse and ride away. So that was fun. Yeah, image to video models. I hope there's going to be another open source one because it would be a shame to to inhibit that growth and, you know, let this huge community out there down by putting up paywalls when, in the end, honestly, all the data it's trained on is ours. I happily will provide my own data and creativity to these models, but I'm going to want those models in return. Yeah, that makes sense. That's a very good ethical point that you raise as well. You know, a lot of these models are trained on the collective of human knowledge and data. And we deserve some kind of access. Obviously, I suppose the companies that are making these models need to make money as well. But it's a fine line, isn't it? It's definitely a fine line. So that's, yeah, that's really cool. And I love the sound of where it's going. Also, a key point I think you raised is that you're really interested in the image to video. Is that because it does help with character consistency and getting exactly the scene you want? I think it just fits so well into a traditional filmmaking workflow because you start out sketching a storyboard usually. then turning it into an animatic where the storyboard is slightly moved, and then you shoot the film and get the final footage. So we can skip straight from storyboard to film if we have image to video, which is how our trailer is developed. So that, I think, just makes the most professional sense to have as a feature, because otherwise you start at zero again. You can feed it a script and just kind of hope it makes shots that fit together, which will never happen. Like, that's not... It's going to be a couple decades, I think, before we can just... Okay, it's going to be like three years before we can do that. But yeah, I mean, I work with directors who make traditional films every day, and they don't like randomness, of course. They want this control. So if you can give them step-by-step, okay, here's the sketch, here's the image, here's the motion that fits to the image that we had you greenlight. This is central. Is there anything else in the world of AI and creativity that is exciting you right now, whether it's a big new proprietary thing? I know you've mentioned Luma that's quite new in the world of video, or maybe it's the biggest open source thing yet that you're excited about. I don't know what's bigger than stable diffusion. But anyway, you take the floor. Is there anything else that's exciting you? Yeah, I mean, I'm super excited about performance transfer. If we can get actual actors into this and, you know, turn them into crazy characters without a huge CG budget. Something like Live Portrait came out today or yesterday, where you can just act out what you want your character to do and not only put that onto a still image, but onto a moving image. So we will definitely be playing around with that. And again, open source MIT license. Thank you. That's fantastic. I mean, from this show, there's a lot of stuff for people to check out. A lot of very detailed stuff as well. So. Please tell us where people can go to find out more about you, Albert. It's extremely useful stuff. I'm guessing definitely the YouTube, because there's some really detailed stuff there. Absolutely. I mean, I'm just with my name, Albert Bozizan, on most platforms. Feel free to connect with me on LinkedIn, on YouTube. That's where I'm most active, I'd say. And of course, storybookstudios.ai. Thanks so much for checking out this episode of the Creator Magic Podcast. Would you like to get a weekly email with all the latest AI tools I find? If so, head over to mrc.fm forward slash creator magic and hit subscribe to get my weekly email newsletter. That's mrc.fm forward slash creator magic.