Automating Instagram Outreach with AI

What's going on everybody, it's Nick and in this video I want to run you through how to scrape Instagram effectively, scalably, and with data that lets you customize DMs and outreach using AI. I'm going to walk you through my personal entire funnel from end to end and I'll show you a couple software platforms and give you guys sort of a behind the scenes look at how to do this realistically within a company. So if that sounds like something you're interested in, stay tuned and let's get into it. Okay, so first things first I got my Instagram profile open here and just so that nobody yells at me I'm just gonna use my own profile and maybe just like random bot profiles that I find for the purpose of the scraping But what you see when you go on my profile is you see a couple things We have like a little profile description here. Obviously, we have a link to threads We have you know, the first name last name. This is where like the bio or the description would go We can see some linked websites and then we have a bunch of images here And essentially what I want to do over the course of the next few minutes is show you guys how to get all of this information in JSON format and then use it in an AI flow. And what we're going to be doing in particular is we're going to be feeding in some image post data into AI and we're going to have it customize the first line of a message that we're sending out to people. In my experience, and I do cold outreach basically for a living at this point, the best cold outreach is customized, but it's also incidental. And the way that it comes across is almost just like you're talking to a friend at the bar. Or, you know, you're just texting somebody on your phone. You know, it should be, hey, X, saw your post about Y, wanted to talk about Z, if that makes sense. And that's just going to be the sort of format that I'm going to follow here. Now, I should make it known that Instagram has changed their API. And so you can't actually send DMs. Well, I was going to say legally, but it's not like Instagram's the government. Not yet. You can't actually send DMs just using the Instagram API, so we can't actually do the sending of the DMs on our end, unfortunately. Or at least I'm not going to be showing you how to do it in this video. There are a variety of web browser automation tools that allow you to do that sort of thing. And if you guys are interested in having me run you guys through how to actually go and send a DM, like having a bot that opens a browser, clicks on messages, clicks on a DM list or whatever, I can do that for you. Just note that Instagram is probably the world's most sophisticated web service, and so as a result, they're very, very secure, and they're very, very difficult to automate. You know, because this is like a problem that's basically, basically they've been solving for the last 15 years, or however long they've been around. Regardless, we're going to get a bunch of data, and then just for the purposes of automation, I'm just going to dump it all into a big Google Sheet, and then we're going to take that Google Sheet data, and then you can do whatever you want with it, whether you want to add it to... Maybe your own CRM with like a task for like a virtual assistant or salesperson or, you know, maybe you want to run some additional analysis on it, whatever. At the end of it, we are going to get, you know, the images, we're going to get the profile, we're going to get the URL, and then we're also going to get like something to tell them. And yeah, that's going to be what we're going to develop over the course of the next few minutes. So let's talk tools. You can obviously use a variety of tools to do Instagram scraping. There are probably like 30 or 40 of them out there now. What I'm going to do here is I'm going to show you my favorite. which is called Phantom Buster. And Phantom Buster is basically like, I don't really know exactly what you want to call it. They use browser automation under the hood. And so they are going out and like clicking on links for you and then like scraping a page and then like, you know, paginating and all that shit. But they're probably like the most secure or safe, I want to say, because, you know, you obviously need to log into a profile in order to do the scraping. And so, yeah, basically as a result of... using Fantabuster, you can get very, very safe data for reasonably cheap. And the way that they do their billing is they use sort of like execution time because they have these cloud servers. It's like hosted somewhere else on the web. And instead of billing you per run, like make.com does, they'll bill you per minute used. And I just get the starter plan and I run probably like four or five businesses out of it. And it's perfectly fine. I basically never run out of ops. So. You're probably good with a starter plan. You can also just use a trial for the purposes of the video if you wanted to like build out something relatively similar, but yeah, I'm gonna use Phantom Buster for this and there are a variety of other tools as well. We're also obviously going to be using Make.com. So if you're unfamiliar with Make.com, you know I have a whole tutorial on how to use that platform, extremely powerful automation platform that I made a ton of money with and you can too. And then yeah, we're just gonna be using Instagram. So let me walk you through what a flow here might look like and I'm gonna start off by just using Phantom Buster. and then I'm going to get a bunch of like basically example data and then with that example data I'll like feed it into AI, we'll do the prompt and then after that we'll actually go and then I'll show you what like how to import that into Make essentially. So, first things first, let me just see what Google Sheet account I have access to here. So we're going to be adding a row to a Google Sheet. And then, okay, so this is going to be on my other email. I have so many emails open here. It's bonkers. Okay, we're going to call this Instagram Phantom Buster Scraping. And I'm not going to fill in any of the... fields right now. I'm just going to share this with an email address. This is sort of the less glamorous part of doing a lot of these automations. In order to obviously be able to like access the correct sheet and all that stuff, you just need to make sure permissions are good. Okay, next up, I'm going to go into Phantom Buster. So I'm going to assume that you've already signed up. When you sign up, you get a page that looks something like this. And essentially what you see is you see a bunch of phantoms. This is just, I guess, like a cool branding quirk. A phantom is just a bot. You're just calling it a phantom instead of a bot. And if you click on Use a New Phantom, and then you just type Instagram up in the top left, you'll see that there are a variety of phantoms or a variety of bots that we can use in order to get our data. And the data that's returned by each of these phantoms is quite different, so pay close attention to the sorts of things that we're doing. There's a follower collector, which will extract the followers of an Instagram account, and if you click More, you'll see what you give and then what you get. So you can see that we get the profile URL, the username, the image URL, full name, ID, is private, is verified. And so this is probably what I would use if I were... you know, attempting to do cold outreach for people that are in a particular niche, I would just find a follower or a followee or like an authority or an influencer that's sort of like definitive in that niche. I don't know, maybe it's like Marquis Brownlee or something like that for tech products. And then I would extract or collect their followers just based off of, you know, the little followers button. So it's going to go through and find all that information. And then what I do is I'd feed every profile into another phantom, which maybe helps me get image data in. post data and that sort of thing. Because I am doing this just on my own profile, I'm just going to go on one. So you can see there's like auto commenter, auto follow, auto like, or you can do a lot of like active things with this as well. If you guys want me to do a video on any of that, just let me know. We're happy to do so. I figured I'd keep it simple here though. We're going to do, okay, profile post extractor. That might be one. So I'm just going to, oops, I don't actually want to use this. I'm going to go back here and just click more. We'll do profile scraper. This is probably the one that we're going to use. We can do story auto watcher. That's pretty interesting. Story extractor. Story viewers export tag post extractor. So I'm pretty sure the one that I want to use is called the profile post extractor. Yeah. So we're not going to use the photo liker obviously. Let me just see what we can get with the profile scraper. Yeah so you can get the bio website, you can get all that stuff, but no we're going to want to get the uh we're going to want to use the profile post extractor. And so I've actually already like taken one of the phantoms here so I'm just gonna use the phantom that I have right over here Instagram photo extractor. But yeah you know just find Instagram photo extractor and then use that. Okay great so after we're done with that let me just call this YouTube Instagram photo extractor. Once you're done with that you're gonna want to click into the phantom and you'll see a page like this. This just shows you your run history. You can see I ran this on my profile a while ago. probably when I was thinking about doing this video. And it showed me, you know, the information like the Instagram, the comment count, the like count. You get a ton of information, which is pretty nice. So next, you got to click on this little setup. And then what it's going to do is it's going to ask you for your Instagram session cookie. Now, if you're unfamiliar with what a session cookie is, you know when you log into a service and you put in your username and your password? Well, basically, in order to prevent you from having to resend your username and password every time you want to do anything on the platform, like anytime you want to like a post, right? Instead of having to type in your username and password to authenticate, Instagram will basically do some big mathematical operation and then give you a very secure password like a like a Like a secret phrase and that's what this cookie is and then for the next X hours Maybe like the next 12 hours you can use this secret cookie or the session cookie I guess I should say in order to like gain access and every time you make a request to Instagram server every time you Basically like data changes hands this secret passphrase is also passed between you two And that's just how Instagram ensures that you are the person that logged in before. Now, the really cool part about Phantom Buster is they automatically grab your session cookie for you using a little Chrome extension. And so if you see, I have a little Chrome extension up here with this cute. Super kawaii ghost. And that's Phantom Buster. And essentially it just goes on my Instagram account and then grabs my session cookie for me so that I can do all of the browser automation shit that we looked at a second ago. So got my cute little Chrome extension up there. Got my Instagram session cookie. All I have to do is click a button. That's connect to Instagram. And then it'll go grab my session cookie and then I'm good to go. And I should note that this is pretty sensitive information. Like the fact that I'm exposing this session cookie to you right now. Basically everybody would say, Nick, this is stupid. You shouldn't be doing this. But I know that the session cookie turns over every so often and so I'm not personally worried about anybody using mine But if you're using this in like a company or something where you have maybe a personal Instagram account or whatnot Just make sure that you hide this or you know You're not really showing this to other people in the business odds are they're not gonna know what to do with it Anyway, but just something to keep in mind Session cookies are like, you know The secret code that they give you as a result of putting your username and password in there It's not like you can recreate your username and password, but anybody that has this can log into your Instagram account and do shit with it. So, anywho, I'm going to click Login, and then we'll do instagram.com.nix.graph. That's just the URL of my profile. When we scale this up, we're going to do this on multiple profiles, so we're not actually going to be using this one. We're just going to feed profile by profile things in here. You'll see if you scroll down a bit. It says you can give your profile URLs in one of the following formats. The URL of a single Instagram profile, the URL of a Google Sheet containing a list of Instagram profile URLs, and the URL of a CSV file containing a list of Instagram profile URLs. And so we can actually scale this up any way that we want. We could just run this once per Instagram profile. That's probably a little inefficient. What we're probably going to do is do a Google Sheet or we're going to do a CSV file. And then the Google Sheet, you can just generate a Google Sheet every time that you want to do this and then use that Google Sheet as the URL. So pretty straightforward stuff. You have some other things you can define over here. But since we're not using a Google Sheet, that's not a big deal. You can set the number of posts to extract per profile. Just to keep automation limits as low as humanly possible, I'm just going to extract three, and then I'm going to have AI tell me something about these three. And then I'm going to combine their, I guess, descriptions or whatever, and then use that to say something. And then the number of profiles to process per launch is going to be one. There's some advanced settings down here, but usually you don't have to worry about the advanced settings. So we're going to click Save. I'm going to run it once just as a purpose of this example. And then I'm going to say no thanks here. Okay, and then I'm going to launch. And now it's actually going to go out and log in on my Instagram and then feed in the session cookie, navigate to the URL, and then basically just scrape the JSON from that whole page and then feed it to me in a format that I can use. So that took not even like five seconds now that I'm looking at it. Yeah, it's a successful authentication. It says no new post found just because I ran this a few days ago when I was figuring out what I wanted to talk about in this video. But the end result is you get this information. So you get the post URL with the column, description, comment count, like count, location, location ID. I guess, you know, Port Moody, British Columbia is a specific location ID. Publication date. You can see I haven't posted anything in like five years. Jesus. Liked by viewer is sidecar. I don't know what that means, but it means something. Type, caption. profile URL, username, full name, image URL, post ID, query, timestamp tagged full name 1 and tagged username 1. So what I'm going to do here is, just for the purposes of this demonstration, I'm going to download this file. It's going to be a CSV. And then remember how earlier I... Let's see. This Google Sheet, well, now I'm just going to upload what I just downloaded here into this Google Sheet. And it's just going to recreate it. Okay, great. And now I have the data in some format that I can work with realistically. I mean, I did three images here, but realistically, I'm just thinking about like the quantity of the data that's coming in. We might just want to do one and just have it tell us something about the last image. That's probably smarter. We could loop around three, but you know, maybe token costs would be a bit much. So why don't I just cut these other two out, and then we'll just use one. Okay, and then the reason I'm doing this, just to be clear, is because I want to have the data in a format that I can then use to call a make scenario. And Google Sheets are just the simplest and easiest, and I think most people here intuitively understand it. So I'm just using this as an example. We may not actually have a Google Sheet at the end of it. Okay, great. So now let's actually go out and then do something with this. So, excuse me, I think I've shared this to the correct place. Yes, I have. Let's go to make.com and then I'm just going to use Google Sheet as the source and then for the purposes of this I'm just gonna say search Rose although in the future maybe we're not gonna use search Rose I don't really know. It looks like there's some issue like validating my account so let me see if I can find this spreadsheet here. Should just be Instagram. Yeah, Instagram Phantom Buster scraping, beautiful. We're going to call results. That's our wonderfully descriptive sheet name. Then we're going to run this puppy just as a test to get the data. You should always be running things before you build out your full flow just to make sure that everything that you're doing is necessary and you're getting all the data that you need. So you see I got my post URL, I got the description, I got the comment count. There's an image URL here. If I copy and paste this image URL, this is the URL signature expired. That's kind of weird. I might not have pasted it incorrectly or something like that. Oh, you know what? I think I know why this happened. I think this happened because... Yeah, I think I know why this happened. Because I haven't done this in a couple of days, right? So I imagine when it scraped it, it probably... It probably didn't add the new URL. Instagram will update its URLs all the time and what are called URL signatures just to make sure that people aren't scraping data, storing it, and then using it, I don't know, six months later or something. And so they're very, very careful about this. So when I ran this, in order to preserve my resources, Phantom Buster has sort of like a cache, and it says, oh, we've already scraped this image, so I'm not going to scrape this image again. So we may just need to scrape the image one more time. So in order to do so I've, yeah okay so this should work this is one new post extracted. So if you guys remember last time it said no new data to update but if I go to the image URL that it pulled now I bet you it'll work. Yeah there you go that's me absolutely demolishing a Subway sandwich. Okay so I'm just going to download this new results file and then I'm gonna go back over here that's just that we're all clear you know what you would do. if you were attempting to test this out on your own. So you just drag and drop that here. I'm just going to replace the spreadsheet because this is like the old data and I want the new data. Okay, cool. And now I have accessible image URL with like an accessible signature. The signature I believe is just in the URL after question mark STP and then they just have like this whole like long string and that just takes like the current date and time and then make sure that you're not using it six months from now or whatever. I don't know if you guys are familiar with kind of how that works but like there were a bunch of services out there that tried scraping. Instagram and then like hosting like Instagram alternatives for a while. And so this is just how they shut all that stuff down So now that we have the data, why don't we do something cool with it? Let's feed it into AI and then so in order to do that I'm gonna go to add and I'll just type open AI and then I'm gonna do create a completion I'm gonna do analyze images. Let's do that image that we're gonna supply is gonna be an image URL and the image URL in particular is going to be this image URL. It's literally called image URL. And then I'm going to say the following is Instagram image. Shouldn't be a problem. Sometimes when you feed stuff like this in, like with brand names, GPT-4 or GPT-3 or whatever is doing the image analysis will be like, I'm sorry, I can't scrape on Instagram. I'm so sorry, Joe, or whatever the hell that Hal 9000 is. a meme is. So we're just going to say the following is an Instagram image and then I'm going to say I've included a caption below as well and then we'll just say caption and then tell me something about use the image and the caption to write a customized one line introduction. Use the following format. We can do JSON. Yeah, you know what? Let's just see how it goes. Let's just see how it goes without actually doing JSON formatting and stuff like that. I'm just doing this off the cuff. I haven't run it. through this specific example. So we can sort of learn together and maybe iterate and hopefully this will show you guys at least some insight into my thought process when I'm designing these things. So I'm just providing it some examples because I want to steer it in a direction where I'm just not going to hate the first output. Yeah that looks pretty good. We'll just roll with that. We're gonna spot the image URL max tokens. Yeah 300 that's probably overkill We probably need like 150 and then are there any advanced settings that I need? Let's Decrease the temperature top tee looks good. So we'll give that a little run Unable to parse a range. Oh, right, because I just re-uploaded this file. So I gotta go back to the sheet name, go to new results file. This may... no, this should be okay. Okay, so we're gonna run this puppy again. So we got the image URL here, so it's currently reading the image, and then it's also taking a caption, and it's telling us something about it. And let's see what the result was. Your latest snap is epic, taking a big bite with those cool shades on. Okay, that's kind of lame. Let's just say using no frills tone of voice. Let's run this puppy. Going all in on that burger, I see hope it tasted as good as it looks. So that's a pretty good output, which looks nice. Now, there are a couple of different ideas here. You can use JSON, like I've used in the past. And so, you know, if I use JSON, what I'll do is I'll say something like, write in JSON format, and then I'll say example, and then I'll have it like output JSON, and then I'll parse the JSON. Usually I'll do that when I have it outputting multiple things. So if it's outputting, I don't know, like an icebreaker, but then like a reason, and then maybe like a heading and then a subheading or something. I'll just do it in JSON because it's really easy to index afterwards. In this case, all I'm really doing is I'm just generating a one-line icebreaker, so I don't really know if I need anything else. But yeah, just to sort of walk you guys through what my thought process is there. It looks like there are quotes. that it generates. The quotes seem syntactically important. I've provided quotes in the examples here just to show and kind of disambiguate it from the previous text. So I'll just leave the quotes for now. And then I'm just going to trim the quotes as necessary, or maybe just remove the quotes. And then, yeah, I think that's all that I need to do. And then what I'm going to do is I'm going to go back into this Google Sheet. And then There's sort of a couple different ways that you can do this. You could hypothetically create a new Google Sheet called like Instagram Phantom Buster, you know, DM list or something like that. But I think that, you know, it's possible that anybody that I get to do this task, like a virtual assistant or whatever, can use this information in order to improve the quality of the message if they need to or something like that. So what I'm going to do is I'm just going to add a new column, and I'm going to add it all the way at the end of this. And it's just going to be called like DM Icebreaker. And then I'm going to copy a template. that I've set up here to sell my service. And then I'm just going to like make it very easy, copy and pasteable. I think if I could go back in time, I might do this on Airtable, but I know that not everybody here has access to Airtable. So maybe this is a better call. Airtable is just a lot easier to build these formulas and stuff like that in it. And then, you know, it's also a lot more, I think, straightforward for virtual assistants and stuff, but that's okay. So for here, for this column, I'm just going to write DM. Maybe we'll go DM. We can't camel case DM. I'll just say DM. And then I'm gonna go back to the integration in particular and then we're not gonna add a row I think we need to update a row or some shit like that. Let me see. Yeah, it's probably update a row So first thing we got to do is we need to run this again Because we have a new structure here with a new New column I'm gonna connect this and then I need to reuse a different account. I think I just have so many of these I haven't really gone through and swapped them. And then what am I doing here? I'm looking for Instagram. No, this might be shared with me. So we'll do Instagram. Okay, great. And then the sheet name is going to be whatever the new results file is, and then the row number in our case. We're just going to take the row number in from the previous Google Sheet, and then what we're updating here is the DM. And it's going to be high and then we're going to grab the previous first name. The way that I'm going to do the first name is I'm just going, I mean, I'm going to do a really hacky here. I'm just going to say full name. Then I'll split this based off the presence of space, I think. And then I'm going to get this first result of that. And then I'm going to I guess DMs are usually in one line, so I don't actually need to do a new line. In order to keep the tokens, or the DM as short as possible, because I think there's like a DM character limit in Instagram. Let me see what this is. 1000 characters, that's actually pretty long. We're probably not going to need this. I'm going to feed in this. We should just replace all of the quotes with empty string. That should work. So this will be our one line DM. Let me just check if my one line DM is going to include like punctuation. So we got an exclamation point there, we got like a period here, and I got exclamation points. It's probably going to include punctuation. I thought you might be interested in X. We do. I and offer Z. Let me know if this might be worth a chat. Okay, cool. So now I'm going to run this puppy again. I'm going to test this out, make sure that I updated that row. I guess since it's all in one line, we don't really need to capitalize that first letter. But anyway, yeah, that seems pretty good. And then what we can do is we can populate this Instagram Fanta Buster scraping sheet with just like a massive list of, you know, people and posts and that sort of thing. And we have a profile URL column here and a username. So we should be able to, you know, in the future, use this information to like index any other database that we set up with Instagram information. So like if it's a list of profiles, we can use this as like a primary key and then use that to sift through. If it's anything else, we can. Make it pretty simple. And now like your SOP for a VA or something like that, you know, you might have, I don't know, another column here that's like status. And then, you know, you might just be able to like do a check mark or something like that. But basically they just go through this one by one, copy it, and then paste it into Instagram DMs, send it, and then, you know, check it is done and then just move on to the next one and that sort of deal. And in this way, you know, instead of paying, so really there are a couple benefits to doing this sort of. approach if you guys are interested in how to systematize this more generally. You know, like a lot of the time virtual assistants, you know, live in much lower cost quality living countries. So like in the Philippines, you know, where the average wage is maybe five or six dollars an hour, you can provide, well, actually I think it's substantially lower than five or six dollars an hour, but like the five or six dollars an hour is like the virtual average wage. So you can provide a significantly lower wage on an hourly basis. But you also don't really have to suffer any of the drawbacks if you think about the typical drawbacks of offshoring, which are, you know, usually some type of like language incompatibility or, you know, a slight difference in culture or attitudes towards work. And so in this way, it's just a very simple, straightforward, streamlined process where you get to leverage AI to do like the customization aspect. And then all you have to do with the virtual assistance time is just sort of copy and paste and copy and paste. Because, you know, we're using humans for this purpose. We don't necessarily have to suffer. anywhere near the same rate limit sorts of issues of Instagram, Word of the Tech that we were using bots. And, you know, presumably if you're selling something via DM, it's probably some type of high ticket offer. Like a lot of people do coaching via DM or something like that. So the five or six dollars an hour might net you like several hundred DMs with a conversion rate of maybe like half a percent, one percent. You can usually at least convert a sales conversation or two for like four or five dollars. And, you know, just for your guys understanding, being able to like generate a sale opportunity or a sale. or a booked meeting or whatever you want to call it for four or five dollars is like insane most businesses will spend uh you know anywhere between like 30 to 40 times that for qualified opportunities so yeah you know there's some issues with instagram dming for sure like instagram dming is uh i think you know you now lend in the requests pile so i'm not necessarily recommending that uh everybody drop what you're doing and then use instagram dms or anything like that i found a little bit of success with it for my own business a very tiny bit And then I've also built systems like this for other businesses that have found substantially more success, particularly the coaching niche. But yeah, I just wanted to give you guys sort of a sense of the order of magnitude value in a system like this, even one that's not fully automated and even one that the Instagram API has sort of tried to stamp down. Very, very potentially profitable. So the question now is, okay, I showed you how to do this manually. How do you do this on an automatic basis? Well, there are a couple of caveats to this. So I think I mentioned this to you before. because of the whole DM, Instagram, API thing. We sort of have like a little thing that we need to solve first before we can do this. This session cookie lasts a certain amount of time. And so what that means is in order for the system to run completely autonomously, you need to update the session cookie every so often. And everybody that's making videos on this stuff will not tell you this part because it looks really glamorous if you can just run a Phantom Buster loop over and over and over again, right? But in reality, you know... Reality is always more complicated than fiction, right? So, in reality, you need to take a couple of additional steps to make sure that this works. And if you think about it, let's say that the Instagram session cookie expires after two hours. That means that every two hours that you want this thing to run, you're going to need to make sure the session cookie is right. And then if it's not right, you're going to need to go and update it. How do you update it? Well, you can either manually go to this page and click connect to Instagram, or you can use like a third party automation tool, like a browser automation tool, like what I was mentioning earlier, to go and then log into your website or Instagram and then retrieve the session cookie from one of the cookies and then use that to update some resource. So why do I say this? I say this because if I were building this out of practice, what I would do... is I would have a Google Sheet, and I would call this authentication. I would have a list of the profiles that you are going to be using to scrape. I would build it out, so maybe I'm using Mixer app to scrape. I would then build it out so that we have a column here called session cookie. And then we also have some like date last accessed. So if I go back to my new results file, I think I'm just going to pull a random timestamp here. Okay. What I'm going to be doing is at the beginning of every automatic execution of this Phantom Buster scenario, which we can do through make, I will look through the session cookie for the profile name that I'm using to access. And then I will attempt to run it using that session cookie. And then if the session cookie is valid, it'll just continue down the flow. And if the session cookie is invalid, what it'll do is there'll be some error handle. And then using that error handle, I'll run another scenario. That other scenario will be for my browser automation software, which will then go out onto Instagram.com, you know, spin up a web page, try and log in. And then when it logs in, it's not even going to do anything. It's just going to like log in and then pull the session cookie and use it to update this. It's then going to call back to the original Instagram scraper. And then it's going to run the Phantom Buster module again with this session cookie. That's sort of how all this looks from a high level, at least on the make.com side of things. Now if you want to do this in practice what you do is you go down to Phantom Buster here and then you would have a module called launch a phantom. And this is gonna be like a multi-scenario sort of deal so you'd have like one called launch a phantom. If I go to map you'll see there's Instagram photo extractor. I believe this is the one that I was using. You'll see that one of the fields here is session cookie, another spreadsheet URL, profile URLs, column name, number of posts per profile, number of profiles per launch, CSV name, right? This is like the output file. And so, you know, what we would do is we'd have a Google Sheet. We can just copy this search rows. We'd make this the trigger. We'd run this on some schedule. So maybe, I don't know, every 60 minutes, let's just do 120 minutes. We'd connect this. And then if you remember, I added a new page here called authentication. Now I only have one element here, which makes it really easy, but you may have several profiles that you're going to be using. Anyway, we're going to run this puppy, and then you see there's a big session cookie. So what we do is we'd feed in that session cookie. I have to reconnect this, and then I have to cancel this, and then open it again. Make can be sort of finicky with this sort of thing. with this sort of thing, so just keep that in mind. Then I'm going to run a YouTube Instagram photo extractor, go down to session cookie, feed in the session cookie. If we had a big spreadsheet URL of profiles, then I would use that. For the purposes of this demonstration, I can just use this profile URL and it should function similarly. The column name only applies if you're using spreadsheet URL and the number of posts per profile, we're just gonna do one. Number of profiles per launch, we're gonna do one. CSV name, I'm just not gonna do anything. And then I'm just gonna run this to show you guys how it works. You can only get first post. Oh, that's interesting. I guess they have an additional option via the API that you can do. There's watcher mode, filter results. You can add some filters here. Save argument, manual launch. That's kind of funny. Whatever, we don't need that. Okay, so it's gone through, and then it's gotten our authentication, and then it's ran the phantom. But as you can see, the output, it didn't provide anything in the output. And so the reason why they do this is phantoms take a certain... amount of time to execute, right? So we don't actually know whether or not the phantom is going to be done. Like we can't just wait for it to finish because it may be like five minutes, maybe like 40 minutes, depending on how many profiles you want to go through. So they sort of separate into a two-step design pattern. They have the first scenario that sort of like launches the phantom, and then they have a bunch of other modules where you can create new scenarios that watch an output. And so notice how I can't actually connect this to my flow because this is a trigger only. So basically what we have to do in order to make this scenario work is we need to make it multi-step. So number one is we need to launch Phantom Buster Instagram scraper. Just gonna save this. I'll go back to my example build and I'm just gonna create a new one. This sort of two-step design process just in case you guys are interested in developing these sorts of things More often you use this basically every time you use like a scraping or a browser automation application just because again There's like a variable amount of time between when you'll launch something and then when you'll be able to like pull its output So extremely common Yeah, so we're gonna go back to example builds here. We have launch phantom Buster Instagram scraper I'm gonna create a new scenario and then I'm going to say Watch output of Phantom Buster Instagram scraper. I'm just going to copy all this stuff in here. I think there's something preventing me from clipping. Continue allowing this. This is weird. I don't know why this isn't allowing me to paste. Oh, it might be because I have two things that I'm copying and pasting. Yeah, there you go. I was trying to copy and paste the title as well. Anywho, then we are going to feed in this watch and output. Now, notice what happens when you connect this. You just have the phantom ID, and that's all you have to put in. All you're doing is you're just watching to see if that runs. You'll also notice that this is not ACID. What that means is, or this is an instant and ACID. What that means is this actually just has to pull every certain amount of time to check whether or not your phantom buster scrape has concluded. So that's pretty annoying. It usually leads to much higher token usage. I want this. means in practice, is I'd run this at the same interval that you're running the first scraper. So, you know, if this is set up at 120 minutes, you know, I'd set mine up at 120 minutes too, but I try and like do it a few minutes later so that, you know, if I know that the thing's going to finish at like 120 minutes, I'd do it here. You can also use webhooks, but, you know, in order for that to happen, you would need to know how long it takes. If you wanted to use webhooks, you'd make an HTTP request down here. Connect this. And then on this end, you would use, actually, I think you'd use Phantom Buster get an output. Now that I'm thinking about it, get a Phantom, download a result most likely, or get an output, yeah. Yeah, okay, I take that back. We actually don't need to use watch an output. We can just use get an output. And what we can do is we can use webhook here, set up our own. And then we can create it so that it's phantom buster Instagram scrape completed. And we can actually call this webhook from our other scenario, which is pretty neat. And then when we receive that request, then we can use that to trigger the get an output module. And what we'll do in order to feed this in is we'll feed in the, well, actually, we don't even need the phantom ID because we know what we want to get. We want to get the Instagram photo extractor. And then obviously we're not going to call this immediately. We need some type of sleep or some wait or something. I wish there was a way to scale this up, but you know you could just use like a sleep rule of thumb where maybe you multiply the number of rows by some number of seconds or whatnot and that might be like a good proxy for how long it's going to take. In our case I'm just going to use a flat 60 second wait time and that should be good because I'm just doing an example on one. So now I'm going to make a request. Okay, what am I doing? I'm calling this URL. Fantastic. This is the webhook, so I'm triggering it after 60 seconds. Then I'm going to catch this, and then I'm going to feed it into Phantom Buster, get an output. Then from Phantom Buster, get an output. We're going to actually, why don't we just run this first and see how this goes. So I'm going to run this, and then... I'm just going to change the CSV name because I know that this is going to sort of run it uh, anew. Okay great, so I just triggered this and now we're sleeping for 60 seconds. I'm going to go back to my home page here and you'll see this is actually running now. Now because we're only doing a single page it should run fairly quickly. It's probably not going to take 60 seconds, probably going to take 10, but you know I just want it to be really safe. Okay great, it says one post extracted and then the file is just result.csv. Oh, maybe I need to refresh this. Looks like it's caching results or something like that, so that's all good. Anywho, looks like this is, I don't know, 15 seconds, 30 seconds? I don't actually know exactly what proportion of the, you know, I don't know if this is the 60, no, I don't think so. I think this is probably 60, so it probably goes down to one quarter. Okay, cool. So we're about halfway done. What we're going to do next is we're going to catch that webhook and then use it to trigger the Phantom Buster output. I love screwing around with these much more sophisticated APIs, if I'm being completely honest. Well, not APIs, but services like Instagram or TikTok or whatever, because there are so many safeguards that they put in to prevent you from being able to access their information, that in order to do so successfully, you need to develop these very creative strategies and applications. And then when you do, the results are very outsized because nobody else has the ability to execute at that level. So very, very cool. Anyway, what ended up happening is we got an output. Looks like there was sort of like a text output where it's like container whatever. Yeah, so it just gave us what looks like a log, which is nice. Then there's a container ID where you get the output. We should be able to download a results file. I don't know exactly where the results file would be. Yeah, we might just, we might just do that. So I'm just gonna Go to container ID and then I'm just going to feed it in manually for now just to verify what the format is of the output. Okay, it's a buffer. So we need to convert this buffer into text. So I think we need to do like download a file or something. I've since completely forgotten how to download the file. So maybe we'll try HTTP. So this is getting a file from a given URL, but we need to convert this binary back into JSON. Maybe, is there like a parse JSON module? We parse binary? Let me see here. Maybe one of these, but I'm not entirely sure if I'm honest. This is a binary string so I doubt it. Let me check to see if maybe, you know, it's just any object. Yeah, I had a flow here that was pretty similar. I've just forgotten how to dump this in a format that works. Let me just check to see if there's any... I remember it was a... There's a download module of some kind. So yeah, this is an example of a similar system that was built in Airtable. Oh, okay. I guess you could just parse. Yeah. You know what? I guess you could just parse it. So we'll just drop parse here for the purposes of this. Let me just check to see how long the output bundle is. I think that's probably going to be okay. I'm not entirely sure, so we're just going to run this puppy with this JSON string. No, source is not valid JSON. We're just gonna feed this in to do some manual testing. Then I'm gonna feed in the data field here. Just because I think I was misformatted? Yeah I was misformatted. Okay great so now I have the result object. Looks like it parsed it but it didn't parse it. Didn't parse it in JSON which is interesting. I wonder if I do this. if that would be enough. Let's run this puppy again and ignore the hell out of these warnings. Okay, uh, no. It did not work. We may just have to parse this twice, honestly. Let me see what happens if I feed this same thing in afterwards, because make doesn't really, or Phantom Buster doesn't really like strings. Okay, we'll feed in 10.result object, not 10.object. Yeah, okay, now we have access to everything. So it's sort of annoying that we have to parse this twice. I remember I came up with a solution that made it so we didn't have to do this, but whatever. Anyway, so now we have a bunch of, you know... values here that we could use obviously for this. And we can use this to update the Google Sheet. I mean now that we have access to all this stuff, I don't even know if we need to. Well, yeah, we should probably dump this into a Google Sheet regardless. So why don't we just run the OpenAI thing first? We'll do the image URL, which is right here. And then instead of updating a URL, we're going to add the URL or add the thing to the sheet entirely. And you know what? I have to go and I have to do the annoying sheets authentication again. So we'll add a new row here. We will go through our shared with me, I believe. And then it'll be Instagram scrape. Yeah. And then we're just going to basically split. Okay, actually, we shouldn't call this new results file either. We should call this profiles. Let's do that. Yeah, we'll go over here. And then we're gonna have to refresh this because we just changed the name of the sheet. And if you don't change the name of the sheet in the Google Sheets module, then the API call will be misformatted. And then we'll just go through and then we'll enumerate over all of these. The location, we'll do location ID, we'll do pub date, like by, sidecar type, caption, profile URL, username, image URL, post ID, query, timestamp, and then the DM will just be the result. And you know what? No, it's not just going to be the result. Oh, shit. I think I, oh, actually, you know, I could have just used this. What the hell? I'm silly man. Okay, we're gonna get the full name. The result here is this result. Okay, great, that should be good. Yeah, that should be fine. Now we're gonna do this, we're gonna test this just once on this flow, just make sure this works. We should add a new row to our sheet with basically the same information except from this. It seems you're enjoying your burger in your latest photo. Yeah, so, you know, the copy isn't the best in the whole wide world, but I still think that it's reasonable, and that's probably more than enough, if I'm being honest. I've seen much shittier campaigns run extremely, extremely well. Now that we have this, obviously we need a way to do multiple profiles. And probably the way that I would do this is I would set up another sheet, and then I'd say to scrape, and then inside of to scrape, I would do a Phantom Buster run here. If I go back to this dashboard, there should be a way to, let's just delete all this stuff. Annoying. It'll make a new one and we'll call it Instagram follower collector. We use this Phantom. I'm just going to do this as an example first so you guys can see. So now we're going to actually go out and get a giant list of followers for people. So Mark. He's Brownlee Instagram. I'm just going to use this as an example. It is unsurprisingly MKBHD. For this purpose, I'm going to just do 500. No, actually, let's just do 100. You'll see that it allows you to extract a large number of followers. You can do 5,000 and 9,000 every 15 minutes. So you know, if you do the math, what's 24 times 496 times 5,000, let's say you probably do like 400k, which is pretty huge. And then number of profiles to process per launch. You can have it rerun over and over and over and over again. Um Yeah, we're not going to do that because this is uh specific to a google sheet and then we're good And again, i'm also going to run this puppy manually. Let me rename this so it's simpler We'll call this bulk instagram follower collector and then give this thing a run And yeah, as I mentioned it's going to like extract a big list of followers that's going to do then pretty quick because basically what it's doing is it's going here, it's clicking followers, and then it's just scrolling through really really fast. So pretty straightforward. But anyway the result is you get this file here with a bunch of profile URLs, which is nice. And so what we can do then is we can download this as a CSV, and go to our to scrape. We can import this just so we understand sort of like the data format. And then I'm just going to click replace. I'm not going to replace spreadsheet. I'm going to append to current sheet. That way I don't replace the name and everything like that. And, you know, one thing that you're probably noticing if you're still with me is, you know, when I'm doing these designs, I'm sort of starting at the end and then I'm working my way back. So what was the end that we wanted? Well, we wanted the Instagram information. Sorry, we wanted the AI generated DM. And then, you know, we start with that and then we sort of reverse engineer everything up the queue. So from that, I knew, OK, well, in order to get that information, I need a single profile. OK, well, in order to get a bunch of profiles, I need to have a list somewhere that I'm referencing. In order to do that, I need to store a Google sheet. Right. Logistically, this is like first principles thinking, and it's probably like the number one thing to lean on, I would say, when you're building these sorts of systems out. It looks like you also get the profile URL of some of these people, which is pretty cool. Sorry, the profile image URL, which is pretty cool. It's pretty blurry. I don't know if this is going to be enough for AI to do anything. Yeah, probably not. I mean, it's just a picture of a tree, and it's like 160 pixels by 160 pixels. Regardless, from here now we get the username, and then we also get the profile URL. And if you remember... The one thing that we need in order to query a profile is we just need the profile URL. So we have the profile URL, the username, we have the full name. Not all of these have full names, so we probably need to do some type of exception for if they don't have full names. It also looks like not all of them are like first last like me. I'm clearly a square because I call myself Nick Sarayaf. These people are Karthigood, you know, Incredible India. That's a hell of a name. May I have a jar of Americano, please? Sure, my love. Yeah, so that's kind of annoying. We could set a character limit if we wanted to. That's probably like a simple and easy way to do this at scale. So we could say, you know, over whatever, maybe like 10. If the result of splitting and then getting the first thing is over 10, then we do not want to use that. We just want to like have an empty space. So it's just as high. Instead of high, yeshota underscore meharai underscore 1234. That's probably what I'd do. But anyway. Okay, great. So yeah, so this is us launching Scraper for a single sheet, I think. So what we need to do now is we need to have another scenario. Let's call this like three. Let's call this like four. We'll call these individual Phantom Buster Scrapers. Then we're going to make... a new scenario, actually two, probably two new scenarios now I'm thinking about it. We need to make a Google Sheet. We might need to do one, we might need to do two. So what we need to do here is I'm going to say launch bulk Phantom Buster Instagram scraper, and then I'm going to copy some of the stuff just because we have some authentication sort of pre-created for us. And just be careful you're not also copying the name. because essentially what happens when you copy a module under the hood is you're copying the JSON of that flow. Then if you add a name to it, the JSON breaks and Make doesn't allow you to paste it in. But okay, great. What we're going to do here now is we know that we don't want to like just, we don't need to reference a Google Sheet. Actually, we do need to reference a Google Sheet. What am I saying? We need to get the authentication or the cookie. So I'm going to paste this in. This is going to be my trigger. You know, you might run this, I don't know how often you want to run it, depends on how many, how big you're going to have the script be. I think I set mine to 100 just as an example, so we're not going to do it very often. So maybe this would be like, you know, run, I don't know, once a day or something like that. Maybe it runs like in the morning and then we just go through the list of results later on. Basically what you do is you'd search, you'd get the session cookie, you'd then launch the phantom here and you'd pull the session cookie from this. The profile URLs, you can either store that information somewhere else or if you're like me and you're just using Marques, I really don't know how to say that name, Marquis Brownlee, you would feed that in here. Actually, we need to make sure we're launching the right Phantom here. I think I'm using the wrong one. So let me go to the right one. We're doing bulk Instagram follower collector. That's right. Okay, so Session Cookie would be here. We don't need a spreadsheet URL. Oh, it doesn't actually just allow me to do one? Hmm, that's interesting. I don't know why it doesn't just allow me to do a single one. I wonder what happens if I just feed this into a spreadsheet URL, or maybe queries. My intuition is that this would work, but we should probably try this out, because my intuition has been known to be completely fucking wrong before. We need a session cookie too, so why don't I go back to authentication and then steal this session cookie. Let's run this puppy. Let's go back here and just see if the run is okay or if we really need a Google Sheet. If we need a Google Sheet, then that's really annoying. Okay, yeah, no, I was right. Just because of, like, Phantom, but, like, a lot of the time what will happen with these scraping services is they'll be developed by people without a very strong understanding of the English language. And so, you know, it's just a little bit more annoying for them to pick up on, like, field names and stuff like that. I've found that there's usually like a little bit of give, I guess, in like the titles of some of these fields. And what happened in this specific scenario is instead of calling this, I don't know, input or something like that, they called it spreadsheet URL. And so you think that you need to supply a Google spreadsheet. Instead, you can just supply like the URL of Instagram. That's cool, man. If I were working in French or whatever, I'd probably have no fucking idea what I was doing. So completely understandable. But anyway, in our case, I just sort of thought back to, hey, you know, inside of Phantom Buster you don't just need a spreadsheet URL, you can also just supply a link. And so kind of what's happening under the hood, I imagine, is they have sort of options. They're like, if it's a Google Sheet, then I want you to run the Google Sheet scraper. If it's an Instagram profile, then I only want you to scrape that one Instagram profile using our Instagram profile scraper. And that's sort of what happened here. So yeah, anyway, so we can run this. We can then sleep 60. And then once we've launched the Phantom, we're going to have the same design pattern that we had before, where we get the results, basically, and then we use the results to update some sheet. So I'm going to create a second scenario here. This is going to be, let's see, watch output, watch output of bulk Phantom Buster Instagram scraper. What we're going to need to do here is we're going to need to set up a webhook again. This will be a custom webhook. Drop that puppy in and then we're going to create a new one called bulk Instagram phantom buster scraper. Let me just be correct in my formatting and terminology here. And then we need to feed this in as the URL that we're going to be calling. This is probably going to take a little while longer, so the sleep 60 probably makes sense. And basically the flow that's going to happen is, you know, every, I don't know, once a day at 8.37am, we are now going to call our Google Sheet. We're going to use that to pull our session cookie. We're then going to launch the bulk Phantom Buster scraper on MKBHD's profile. I'm using this as a static resource, but you can imagine how in the Google Sheet you might have another column or another sheet called, like, profiles to scrape or something. In our case, I'm just doing this because Marcus Brownlee probably has more than enough followers to keep us occupied for several lifetimes. And then, you know, before my microphone runs out of battery here, you know, then I'm sleeping for 60 seconds. I'm then making an HTTP request to my next step in my scenario or in my flow. And that next step is the second scenario, which is now getting that webhook and then it's using it to download the result. Now, what I want to do is I want to select the name of the container. The name of the container is going to be Bulk Instagram Follower Collector. So I'm now going to download this. It looks like you can, containers created on March 30, oh I get it. So every time you run it I guess there's a new container. So what we need to do actually is we need to pass the name of the container in the in the call here. You could just say container underscore id equals and then click that in. That'll work. Then we can just grab the the container ID here will be container underscore ID. Okay, great So now we select this we download the result will parse the JSON a couple of times So that'll be the first parse and then this will be the second parse I don't know if it's called a result object or whatever It'll be called but then all we'll do is we'll go into our Google sheets and then we'll just add We'll add rows You can do this. I mean like because of the way that this is set up we're probably going to have to do this one at a time but there are ways that you can add like a thousand rows or a hundred rows to a google sheet without consuming 100 operations that's just sort of like above the uh or sort of beyond the the purposes of this video so i'm not going to cover that right now but yeah now we have what's going to happen is we're going to have like a big list of profiles and then this is way more work than i thought that it would be to delete all these things jesus We're going to have a big list of profiles over... Sorry, a big list of profiles under to scrape here with profile URLs. And then, you know, once every two hours or something, we're going to go through that profile list. And then we're going to scrape every profile in that list using the individual scraper, which is over here. And then we'll again launch another phantom to do the individual side. We're sleeping 60 here, but we don't necessarily need to. And then we are going to make a request to our other scenario. to actually go out and then download all of those results from this. What we're then going to do is I'm going to feed in this container ID to the download result module. We're then going to parse it a couple times, then we're going to feed it into AI, and then at the very end we're going to add it to our Google Sheet row under a sheet called Profiles where there is also a status column and then there's like a DM column. that they can just copy and paste into the DM. And this is sort of like how you automate several hundred hours of work, I would imagine, per month, to actually go and do the scraping and that sort of thing. And then in doing so save you a lot of money that you can then just have a VA or somebody like that actually be responsible for the sending. So that's sort of my idea behind this. Obviously there's still a little bit more that needs to be fleshed out, so let me just make sure that all this looks good. So we're going to call the container ID here. We're gonna run this and then you know we're gonna wait 60 seconds here so let me see how much I can do in 60 seconds. I think we're doing sheets here. What we actually need to do is we need to like add a sheet. Yeah so I need to add a sheet. I don't think I'll be able to do this before. runs, so I'm just going to set a block here. Well, I guess it looks about halfway done. Basically what we're going to do is every time that this runs, we're going to add a new sheet. The sheet is going to be called to scrape. That's where we're going to generate that list of a hundred profiles. And then after we're done scraping them, we're going to delete the sheet and then just regenerate it every time. That's probably the simplest way to do this at scale. Sort of annoying to have to create a new sheet every time, but that's okay. Okay, great. I blocked this. So what do we do here? We got the phantom. We made the request to the other webhook. We got it with the container ID. We then download the result. Now because this is multiple results, I don't know if the data or something was messed up, so I'm not really sure what's going on there. File name is result.json. Let's just get this container ID and then do this manually. The fact that it's empty is weird because... oh, maybe the data on the Phantom Buster side of things is empty because I'm running the same search twice. Fuck, I hate when that happens. Yeah, input already processed, so that's pretty annoying. I think there's a way to disable this setting so that every time you run it it's new. Let me just see... Yeah, this is watch remote, only look for new followers. So no, we don't want that. If you rename your results file at any point between launches, the Phantom will create a new results file. So I think the way to do this is we just, you know, we just run this every single time, or we'd add a different name every single time, just to make sure that every time it runs, it's running off new data. Alternatively, we could just set You know, we could just say if this isn't new and if the results file is null, then yeah, actually, that's probably what we should do. We should say if length of buffer is less than whatever, then don't proceed because this isn't like actually doing a new scrape. It's not actually dumping any new followers in. That's probably what we would do in reality. However, I don't really care about that. because I still want this, like I still want to get the data of the last container. So let me see if I could just do this manually so we can test this for now. And then we can use a full run later. This is the bulk Instagram follower collector. And then there should be a couple of runs. We ran it recently, 45. And then we also ran it at 35. So let me just check to see if this is short. No, this is pretty nice and long. So we're going to run that puppy. And then what do we get? We get a list of profile URLs. Great. What do we get here? We parse that and then we get a bunch of bundles. Great. So every one of these bundles is profiles and we can... We could just add... Yeah, okay, so we're gonna have to... Okay, what we should do is we should actually add the sheet before we do any of this, and then we should reference that previous sheet, and then use this to add profiles. So I'm going to move this over here. We're not going to have a block anymore. We're going to create create to scrape sheet, and add this puppy in. This is going to be the same sheet, I believe, over and over and over again. So we should be able just to feed the spreadsheet ID in. And then I guess we can just select from the list. It's probably going to be shared with me. And we're probably going to need to re-authenticate. Okay, we'll do Instagram Fanta Buster Scrape. I'm going to call this To Scrape. And we'll create a new one basically every time that this scenario runs. And then actually we should probably do this after we do the Phantom Buster download result. What am I doing? This way we should be able or maybe we even do it after this parse JSON. Because that way we'll know whether or not like this scenario worked. And then if it didn't work it's not going to run. Then we can parse the second JSON afterwards. And then okay after this we have a bunch of bundles. And so now we can just do add row. And then we have to unfortunately go through the rigmarole of selecting the spreadsheet ID again. It's probably the... man, if I had five seconds for every time I tried to find a spreadsheet ID. Okay, and then the sheet name... um, it looks like we can only manually select the sheet name. That's kind of annoying. So we're just going to call this to scrape, and then the spreadsheet ID that we feed in here is going to be the same as this. column range, I'm just going to say A to Z, and then I imagine the data is just going to be profile URL, username, full name, image URL, ID, is private, is verified, query, timestamp. Okay, this should be good. What I'm going to do as a test is I'm going to delete this. We're probably not going to have a header now that I'm thinking about it. but that's okay. Then I'm going to run this on the same container. We're going to parse the JSON. If this JSON is a long string, it'll continue. Otherwise, it'll error out, which is nice. We could have another check. We could say does not exist. Actually, that's not going to be correct. Let's just do length of whatever is greater than. Let's just say 100 characters. That's a good hack for now. Then to scrape sheet, we'll create the sheet called to scrape, we'll parse the rest of JSON, and then we'll iterate through every bundle and then add it to the sheet. That looks pretty good. Okay, great. Let's run this puppy, see what happens. So it's now adding to the Google sheet. If we go to to scrape, you'll see that it's populating it quite nicely. reasonably quickly I would say. We set it to 50 I think so we're only doing 50. Awesome so that part of the flow looks good assuming that the container that we get is the right one then yeah everything should work there and then what we want to do is we want to we can't select any of the stuff manually now we have to do it all automatic like using ids so let me just see what the spreadsheet id is here. The spreadsheet here says Instagram Phantom Buster scraping, so we're going to need to do this manually. Yeah, this looks to be the same sheet, which is nice. The sheet name is gonna be 2scrape because we just created one called 2scrape. Awesome, it's gonna run and then I'm just gonna run this and we see a bunch of results here, which is nice. Now actually, I don't even know if we need this anymore to be honest because we can just feed in the spreadsheet URL. And then the, I wonder what the Google Sheets spreadsheet URL here is going to be. Well, it needs to be public. So we can probably just feed this in, right? We do that and then this is going to be the spreadsheet ID. We're not going to feed in profile URLs anymore. We're going to do one post per profile, one profile per launch. Actually no, we don't want to do one profile per launch, we want to do more. Then we have, I'm just going to empty all of this, sleep 60. We're probably going to do more than 60. So for testing purposes, why don't we, you know, we got 50 here. But for testing purposes, why don't we just do like five? You know, anything that works with five should work with 50. And then we may need headers. This is really my only question. We may need headers. I don't know. If we need headers, we're screwed. I think we're going to need headers because then we have to go through and then we have to like add the sheet. add a line first with the headers and then we got to go and do everything which is super freaking annoying. The URL of a Google sheet containing a list of... yeah. Oh okay maybe we don't actually need this. Maybe we actually don't need anything except for the profile URLs actually. Yeah we might only need this. Okay yeah yeah I actually feel a little bit more confident about this now. I feel more competent. Probably should, huh? Okay, let's do this. Sleep, we're not going to sleep 60. We're probably going to need to sleep substantially more than that. And this is where you start getting into the question of like, should I use a third party service to store my hooks, right? Like I'm not just doing this for one profile anymore. I'm doing this for like 100 or 500 or 5000. You know, the amount of time that it takes to scrape this is going to be so variable that, you know, I feel like it would be simple. Sorry, it would be ludicrous for me to try and do it on a simple You know flat wait time. Okay. Anyway, so we fed it in the to scrape What I'm gonna do is I'm just gonna run this puppy and then I'm just going to see how the phantom Buster run looks For the individual scraper this one here the photo extractor I should say so we're starting here and then I'm just gonna run this Yeah, so actually we don't need to do this at all. We don't need to feed this in even a tiny bit. Oh, right, right, right. Yeah. So what we can do is we can just click this and then just run this with the ID of the spreadsheet of interest and we can just hard code that ID in. Oh yeah, sorry, I lied. We actually do need that because we need the session cookie, don't we? I lied, I lied. Okay, we'll go back here and we're not going to do any of that. We're just going to scrape authentication and then we're going to grab the session cookie. And the syntax of this is one. So I think I can just go to 11.1 and that should be the same thing. Excellent. Okay, so I should say this is maybe scraping right now. I don't know. Yeah, missing cookie because we didn't supply one. So let's run this again. with the correct cookie. Let's go back here. It's running, which is nice. I love how you get like instant feedback with this tool. We're now successfully authenticating. It's now going out and you know it's pulling in photos for each of these, I would think. It may be running still or it may... have fed in just my profile instead of the Google Sheet. Let's see what happened here. So we fed in spreadsheet URL, we fed in number of posts per profile one. Yeah, that should be working. Let's go back here now that we don't have my profile hard coded in and run it again and see what happens. Okay. Saying that it's running? It's saying the input is already processed, though. Hmm. Let's try hard coding this, maybe? See how that works? Okay. Yeah. So that seems to be working okay. Uh, no. Seems like it. is getting no posts from these profiles. It's possible that none of these profiles have any posts, but I consider that unlikely. Oh, right, because they're private. Not all of them are going to be private, though. Yeah. Yeah, I mentioned that this works primarily with non... Like, you know, the profiles need to be public, of course. In experience, probably about, like, 30% of profiles are public, so the other 70% are not going to be. But that's okay because, you know, we're working with such big numbers here that it doesn't really matter if they're public or private or whatever. You know, if you're running 10,000 followers or whatever, you still have 3,000 to go with. It just looks like none of the ones that I supplied back there had a public profile, which is sort of annoying. Yeah, that's freaking annoying, man. Maybe we'll do Mixerif. I don't know. Do I have another profile? I think I have another profile. I don't know for sure if that one is... I'm not entirely sure. Maybe I should just source all of the people that have DMed me on Instagram and just use you guys. That'd be pretty fun. No, I don't think I'm going to do that. We'll just put these two in for testing purposes and then I'll just run this anyway. And I think one of those might be accessible, maybe not both. So let's run this again. We're then going to go back over here. Mmm, it's saying input already processed I think because this is just me my name So I'm going to define a different output name. I'm just gonna call this X. We're going to run this again I'm gonna connect that to my sleep 60. So it's ran this it's now running and getting a new input successful authentication Looks like we got a couple of files here, which is nice. Okay, and now we're waiting and what we're going to do on this end is, this is number three, right? So now we're going to turn on number four, and then number four is going to catch that output, get the output, it's going to download the result like we did before, do parsing, and then it's going to dump it into the OpenAI module to get that like custom DM. And then, yeah, I believe that's it. I think that's all we need to do. Clearly, it works with the individual URLs. Let me just double-check, see if there's anything else. Oh, the times. We should make sure the scheduling is correct. Yes. So the bulk Instagram Phantom Buster Scraper, let's just say it runs at like 8 a.m. These phantoms, yeah, I mean, I've hard-coded in... like the name of the profile, but as I mentioned earlier you can do it elsewhere. We're scraping a very fixed number of profiles with fixed wait times, which is unfortunate, but obviously that'll still work. We're checking kind of hackily to see if the results object is just greater than 100 characters. If it's empty then it'll just be less than 100 characters, probably be like here or something. So that's just another sort of hacky solution, but you know obviously it works. We've been creating the two-script sheet. Oh, we're currently dumping in all of the information. I don't think we need it. We can just do, oops, we can do just the profile URLs clearly, because that's the only thing that matters. And then I'm going to allow you to download this blueprint. There's the authentication there. The spreadsheet URL is obviously hard-coded to be my spreadsheet URL. So, you know. If you guys are screwing around with this in the blueprint, just keep that in mind. I'm adding a fake CSV name just called X here, but that's okay. And then I'm calling the container ID. Oh, and then it looks like we had an error, but I'll cover that in a second. Then we're catching, and then we are getting the output of this, and it looks like there is an output. Then downloading the results, we are parsing this. It looks like the object that we got was no posts found. So, oh shit, I think we're pulling the old container. That's annoying, but presumably that will not run when you have the new result. And then there are three queries. One says no posts found. Actually, all of them say no posts found. One just says post URL for some reason. And then when we feed it into OpenAI, obviously it's not going to work because, you know, we're not feeding it an image URL. So yeah, there's that. But I know the reason why this is happening. The reason why this is happening is because if you think about it logically... The bulk run that we ran at the very beginning, where we had the two scrape profiles, only includes nick.saraf and then nick.saraf here, which I think is not working. I thought I had an additional profile, but I don't think that I do. And so because we've already pulled nick.saraf, like what, five times at this point, it doesn't want to continually do it unless we continually provide a new name, which I'm not doing. You can obviously continually provide a new name. You can just randomize a number and then just use the number as the name. or generate like a new UUID or something like that. That's probably a good idea now that I'm thinking about it. Why don't we use that as the name? Yeah, we'll generate a new UUID for the CSV name, and then we'll also do this here. That should fix this problem because the likelihood of like a hash collision or whatever, or a collision is so low as to be meaningless. Yeah, okay. Yeah, I mean, honestly, that should work now. We should probably put in some error handling so that if the JSON is empty and we run through this, we're not actually, you know, we might want to add like a little bit of a sleep here as well. So why don't we call this and then why don't we just add like a couple seconds sleep? You don't have to do this, but if you are on a relatively low OpenAI plan, like I'm on like the highest plan, so I can technically do, I don't know. five trillion queries a minute or some shit like that. The number of tokens that can be permitted is honestly quite incredible. But if you're on like the beginning plan or something then you'll probably run into rate limits unless you sleep a couple seconds between each of these. So I'm just going to say sleep two. And then we should have some type of error handle where if error is no post found then it just doesn't run and so it just doesn't convert that into the next row. Yeah. But yeah, aside from that, you know, assuming that you're using good data and assuming that the followers that you're getting are obviously, you know, on like a profile that's big enough, you know, you can use my hard-coded approach or you can add some additional context if you'd like. But yeah, that's how you build an Instagram scraper. What I'll do now is I'm going to download the blueprints of each of these scenarios and then I'm just going to put them up on my make.com for people who want to make real money doc. I'm also going to create a Google Drive which contains all of the blueprints that I've generated so far. Because a lovely follower of mine told me that, as it is right now, it's sort of unintuitive. You sort of have to open up a blueprint, which is then a bunch of JSON, and you gotta download it manually, which is stupid. So I'll try and make it as simple as humanly possible. But in short, this is how you build an automated Instagram scraping machine. As I mentioned earlier, there are a couple of caveats. The session cookie issue is one that you can solve, and I've solved before for a bunch of people. But you just need to develop a third-party browser automation to basically sign into Instagram or whatever service you want every X hours. So, I'm gonna go ahead and start this. So if there's interest there, then just let me know and I'm more than happy to record a video on it. Aside from that, yeah, I hope you guys enjoyed. That was pretty interesting. And hopefully I showed you guys my thought process when designing these things. I didn't really expect it to be a four scenario flow, but just as I was developing it, you know, starting to put the pieces together, I didn't do any of the mapping ahead of time, which, you know, usually makes this a little bit easier. Just as I was doing that I was like, yeah, we're gonna need one to launch, we're gonna need one to watch, then we're gonna have like a two-step. So we'll need to multiply all that by two. Anywho, yeah, thanks so much for watching. If you guys have any specific requests, just leave them down below. Otherwise, like, comment, subscribe, do all that fun stuff. I will catch you all in the next video. Bye-bye.

Okay, so first things first I got my Instagram profile open here and just so that nobody yells at me I'm just gonna use my own profile and maybe just like random bot profiles that I find for the purpose of the scraping But what you see when you go on my profile is you see a couple things We have like a little profile description here. Obviously, we have a link to threads We have you know, the first name last name. This is where like the bio or the description would go We can see some linked websites and then we have a bunch of images here And essentially what I want to do over the course of the next few minutes is show you guys how to get all of this information in JSON format and then use it in an AI flow.

And what we're going to be doing in particular is we're going to be feeding in some image post data into AI and we're going to have it customize the first line of a message that we're sending out to people. In my experience, and I do cold outreach basically for a living at this point, the best cold outreach is customized, but it's also incidental. And the way that it comes across is almost just like you're talking to a friend at the bar. Or, you know, you're just texting somebody on your phone.

You know, it should be, hey, X, saw your post about Y, wanted to talk about Z, if that makes sense. And that's just going to be the sort of format that I'm going to follow here. Now, I should make it known that Instagram has changed their API. And so you can't actually send DMs.

Well, I was going to say legally, but it's not like Instagram's the government. Not yet. You can't actually send DMs just using the Instagram API, so we can't actually do the sending of the DMs on our end, unfortunately.

Or at least I'm not going to be showing you how to do it in this video. There are a variety of web browser automation tools that allow you to do that sort of thing. And if you guys are interested in having me run you guys through how to actually go and send a DM, like having a bot that opens a browser, clicks on messages, clicks on a DM list or whatever, I can do that for you. Just note that Instagram is probably the world's most sophisticated web service, and so as a result, they're very, very secure, and they're very, very difficult to automate. You know, because this is like a problem that's basically, basically they've been solving for the last 15 years, or however long they've been around.

Regardless, we're going to get a bunch of data, and then just for the purposes of automation, I'm just going to dump it all into a big Google Sheet, and then we're going to take that Google Sheet data, and then you can do whatever you want with it, whether you want to add it to... Maybe your own CRM with like a task for like a virtual assistant or salesperson or, you know, maybe you want to run some additional analysis on it, whatever. At the end of it, we are going to get, you know, the images, we're going to get the profile, we're going to get the URL, and then we're also going to get like something to tell them.

And yeah, that's going to be what we're going to develop over the course of the next few minutes. So let's talk tools. You can obviously use a variety of tools to do Instagram scraping.

There are probably like 30 or 40 of them out there now. What I'm going to do here is I'm going to show you my favorite. which is called Phantom Buster. And Phantom Buster is basically like, I don't really know exactly what you want to call it. They use browser automation under the hood.

And so they are going out and like clicking on links for you and then like scraping a page and then like, you know, paginating and all that shit. But they're probably like the most secure or safe, I want to say, because, you know, you obviously need to log into a profile in order to do the scraping. And so, yeah, basically as a result of... using Fantabuster, you can get very, very safe data for reasonably cheap.

And the way that they do their billing is they use sort of like execution time because they have these cloud servers. It's like hosted somewhere else on the web. And instead of billing you per run, like make.com does, they'll bill you per minute used. And I just get the starter plan and I run probably like four or five businesses out of it. And it's perfectly fine.

I basically never run out of ops. So. You're probably good with a starter plan. You can also just use a trial for the purposes of the video if you wanted to like build out something relatively similar, but yeah, I'm gonna use Phantom Buster for this and there are a variety of other tools as well.

We're also obviously going to be using Make.com. So if you're unfamiliar with Make.com, you know I have a whole tutorial on how to use that platform, extremely powerful automation platform that I made a ton of money with and you can too. And then yeah, we're just gonna be using Instagram.

So let me walk you through what a flow here might look like and I'm gonna start off by just using Phantom Buster. and then I'm going to get a bunch of like basically example data and then with that example data I'll like feed it into AI, we'll do the prompt and then after that we'll actually go and then I'll show you what like how to import that into Make essentially. So, first things first, let me just see what Google Sheet account I have access to here. So we're going to be adding a row to a Google Sheet.

And then, okay, so this is going to be on my other email. I have so many emails open here. It's bonkers.

Okay, we're going to call this Instagram Phantom Buster Scraping. And I'm not going to fill in any of the... fields right now. I'm just going to share this with an email address. This is sort of the less glamorous part of doing a lot of these automations.

In order to obviously be able to like access the correct sheet and all that stuff, you just need to make sure permissions are good. Okay, next up, I'm going to go into Phantom Buster. So I'm going to assume that you've already signed up.

When you sign up, you get a page that looks something like this. And essentially what you see is you see a bunch of phantoms. This is just, I guess, like a cool branding quirk.

A phantom is just a bot. You're just calling it a phantom instead of a bot. And if you click on Use a New Phantom, and then you just type Instagram up in the top left, you'll see that there are a variety of phantoms or a variety of bots that we can use in order to get our data.

And the data that's returned by each of these phantoms is quite different, so pay close attention to the sorts of things that we're doing. There's a follower collector, which will extract the followers of an Instagram account, and if you click More, you'll see what you give and then what you get. So you can see that we get the profile URL, the username, the image URL, full name, ID, is private, is verified. And so this is probably what I would use if I were... you know, attempting to do cold outreach for people that are in a particular niche, I would just find a follower or a followee or like an authority or an influencer that's sort of like definitive in that niche.

I don't know, maybe it's like Marquis Brownlee or something like that for tech products. And then I would extract or collect their followers just based off of, you know, the little followers button. So it's going to go through and find all that information. And then what I do is I'd feed every profile into another phantom, which maybe helps me get image data in. post data and that sort of thing.

Because I am doing this just on my own profile, I'm just going to go on one. So you can see there's like auto commenter, auto follow, auto like, or you can do a lot of like active things with this as well. If you guys want me to do a video on any of that, just let me know.

We're happy to do so. I figured I'd keep it simple here though. We're going to do, okay, profile post extractor. That might be one. So I'm just going to, oops, I don't actually want to use this.

I'm going to go back here and just click more. We'll do profile scraper. This is probably the one that we're going to use. We can do story auto watcher. That's pretty interesting.

Story extractor. Story viewers export tag post extractor. So I'm pretty sure the one that I want to use is called the profile post extractor.

Yeah. So we're not going to use the photo liker obviously. Let me just see what we can get with the profile scraper. Yeah so you can get the bio website, you can get all that stuff, but no we're going to want to get the uh we're going to want to use the profile post extractor.

And so I've actually already like taken one of the phantoms here so I'm just gonna use the phantom that I have right over here Instagram photo extractor. But yeah you know just find Instagram photo extractor and then use that. Okay great so after we're done with that let me just call this YouTube Instagram photo extractor.

Once you're done with that you're gonna want to click into the phantom and you'll see a page like this. This just shows you your run history. You can see I ran this on my profile a while ago.

probably when I was thinking about doing this video. And it showed me, you know, the information like the Instagram, the comment count, the like count. You get a ton of information, which is pretty nice.

So next, you got to click on this little setup. And then what it's going to do is it's going to ask you for your Instagram session cookie. Now, if you're unfamiliar with what a session cookie is, you know when you log into a service and you put in your username and your password?

Well, basically, in order to prevent you from having to resend your username and password every time you want to do anything on the platform, like anytime you want to like a post, right? Instead of having to type in your username and password to authenticate, Instagram will basically do some big mathematical operation and then give you a very secure password like a like a Like a secret phrase and that's what this cookie is and then for the next X hours Maybe like the next 12 hours you can use this secret cookie or the session cookie I guess I should say in order to like gain access and every time you make a request to Instagram server every time you Basically like data changes hands this secret passphrase is also passed between you two And that's just how Instagram ensures that you are the person that logged in before. Now, the really cool part about Phantom Buster is they automatically grab your session cookie for you using a little Chrome extension.

And so if you see, I have a little Chrome extension up here with this cute. Super kawaii ghost. And that's Phantom Buster. And essentially it just goes on my Instagram account and then grabs my session cookie for me so that I can do all of the browser automation shit that we looked at a second ago.

So got my cute little Chrome extension up there. Got my Instagram session cookie. All I have to do is click a button. That's connect to Instagram.

And then it'll go grab my session cookie and then I'm good to go. And I should note that this is pretty sensitive information. Like the fact that I'm exposing this session cookie to you right now. Basically everybody would say, Nick, this is stupid. You shouldn't be doing this.

But I know that the session cookie turns over every so often and so I'm not personally worried about anybody using mine But if you're using this in like a company or something where you have maybe a personal Instagram account or whatnot Just make sure that you hide this or you know You're not really showing this to other people in the business odds are they're not gonna know what to do with it Anyway, but just something to keep in mind Session cookies are like, you know The secret code that they give you as a result of putting your username and password in there It's not like you can recreate your username and password, but anybody that has this can log into your Instagram account and do shit with it. So, anywho, I'm going to click Login, and then we'll do instagram.com.nix.graph. That's just the URL of my profile.

When we scale this up, we're going to do this on multiple profiles, so we're not actually going to be using this one. We're just going to feed profile by profile things in here. You'll see if you scroll down a bit. It says you can give your profile URLs in one of the following formats. The URL of a single Instagram profile, the URL of a Google Sheet containing a list of Instagram profile URLs, and the URL of a CSV file containing a list of Instagram profile URLs.

And so we can actually scale this up any way that we want. We could just run this once per Instagram profile. That's probably a little inefficient.

What we're probably going to do is do a Google Sheet or we're going to do a CSV file. And then the Google Sheet, you can just generate a Google Sheet every time that you want to do this and then use that Google Sheet as the URL. So pretty straightforward stuff.

You have some other things you can define over here. But since we're not using a Google Sheet, that's not a big deal. You can set the number of posts to extract per profile. Just to keep automation limits as low as humanly possible, I'm just going to extract three, and then I'm going to have AI tell me something about these three. And then I'm going to combine their, I guess, descriptions or whatever, and then use that to say something.

And then the number of profiles to process per launch is going to be one. There's some advanced settings down here, but usually you don't have to worry about the advanced settings. So we're going to click Save.

I'm going to run it once just as a purpose of this example. And then I'm going to say no thanks here. Okay, and then I'm going to launch. And now it's actually going to go out and log in on my Instagram and then feed in the session cookie, navigate to the URL, and then basically just scrape the JSON from that whole page and then feed it to me in a format that I can use.

So that took not even like five seconds now that I'm looking at it. Yeah, it's a successful authentication. It says no new post found just because I ran this a few days ago when I was figuring out what I wanted to talk about in this video. But the end result is you get this information. So you get the post URL with the column, description, comment count, like count, location, location ID.

I guess, you know, Port Moody, British Columbia is a specific location ID. Publication date. You can see I haven't posted anything in like five years.

Jesus. Liked by viewer is sidecar. I don't know what that means, but it means something.

Type, caption. profile URL, username, full name, image URL, post ID, query, timestamp tagged full name 1 and tagged username 1. So what I'm going to do here is, just for the purposes of this demonstration, I'm going to download this file. It's going to be a CSV. And then remember how earlier I... Let's see.

This Google Sheet, well, now I'm just going to upload what I just downloaded here into this Google Sheet. And it's just going to recreate it. Okay, great. And now I have the data in some format that I can work with realistically.

I mean, I did three images here, but realistically, I'm just thinking about like the quantity of the data that's coming in. We might just want to do one and just have it tell us something about the last image. That's probably smarter. We could loop around three, but you know, maybe token costs would be a bit much. So why don't I just cut these other two out, and then we'll just use one.

Okay, and then the reason I'm doing this, just to be clear, is because I want to have the data in a format that I can then use to call a make scenario. And Google Sheets are just the simplest and easiest, and I think most people here intuitively understand it. So I'm just using this as an example. We may not actually have a Google Sheet at the end of it.

Okay, great. So now let's actually go out and then do something with this. So, excuse me, I think I've shared this to the correct place.

Yes, I have. Let's go to make.com and then I'm just going to use Google Sheet as the source and then for the purposes of this I'm just gonna say search Rose although in the future maybe we're not gonna use search Rose I don't really know. It looks like there's some issue like validating my account so let me see if I can find this spreadsheet here.

Should just be Instagram. Yeah, Instagram Phantom Buster scraping, beautiful. We're going to call results.

That's our wonderfully descriptive sheet name. Then we're going to run this puppy just as a test to get the data. You should always be running things before you build out your full flow just to make sure that everything that you're doing is necessary and you're getting all the data that you need. So you see I got my post URL, I got the description, I got the comment count.

There's an image URL here. If I copy and paste this image URL, this is the URL signature expired. That's kind of weird.

I might not have pasted it incorrectly or something like that. Oh, you know what? I think I know why this happened.

I think this happened because... Yeah, I think I know why this happened. Because I haven't done this in a couple of days, right? So I imagine when it scraped it, it probably... It probably didn't add the new URL.

Instagram will update its URLs all the time and what are called URL signatures just to make sure that people aren't scraping data, storing it, and then using it, I don't know, six months later or something. And so they're very, very careful about this. So when I ran this, in order to preserve my resources, Phantom Buster has sort of like a cache, and it says, oh, we've already scraped this image, so I'm not going to scrape this image again. So we may just need to scrape the image one more time. So in order to do so I've, yeah okay so this should work this is one new post extracted.

So if you guys remember last time it said no new data to update but if I go to the image URL that it pulled now I bet you it'll work. Yeah there you go that's me absolutely demolishing a Subway sandwich. Okay so I'm just going to download this new results file and then I'm gonna go back over here that's just that we're all clear you know what you would do. if you were attempting to test this out on your own.

So you just drag and drop that here. I'm just going to replace the spreadsheet because this is like the old data and I want the new data. Okay, cool.

And now I have accessible image URL with like an accessible signature. The signature I believe is just in the URL after question mark STP and then they just have like this whole like long string and that just takes like the current date and time and then make sure that you're not using it six months from now or whatever. I don't know if you guys are familiar with kind of how that works but like there were a bunch of services out there that tried scraping.

Instagram and then like hosting like Instagram alternatives for a while. And so this is just how they shut all that stuff down So now that we have the data, why don't we do something cool with it? Let's feed it into AI and then so in order to do that I'm gonna go to add and I'll just type open AI and then I'm gonna do create a completion I'm gonna do analyze images.

Let's do that image that we're gonna supply is gonna be an image URL and the image URL in particular is going to be this image URL. It's literally called image URL. And then I'm going to say the following is Instagram image.

Shouldn't be a problem. Sometimes when you feed stuff like this in, like with brand names, GPT-4 or GPT-3 or whatever is doing the image analysis will be like, I'm sorry, I can't scrape on Instagram. I'm so sorry, Joe, or whatever the hell that Hal 9000 is. a meme is. So we're just going to say the following is an Instagram image and then I'm going to say I've included a caption below as well and then we'll just say caption and then tell me something about use the image and the caption to write a customized one line introduction.

Use the following format. We can do JSON. Yeah, you know what? Let's just see how it goes. Let's just see how it goes without actually doing JSON formatting and stuff like that.

I'm just doing this off the cuff. I haven't run it. through this specific example.

So we can sort of learn together and maybe iterate and hopefully this will show you guys at least some insight into my thought process when I'm designing these things. So I'm just providing it some examples because I want to steer it in a direction where I'm just not going to hate the first output. Yeah that looks pretty good.

We'll just roll with that. We're gonna spot the image URL max tokens. Yeah 300 that's probably overkill We probably need like 150 and then are there any advanced settings that I need? Let's Decrease the temperature top tee looks good. So we'll give that a little run Unable to parse a range.

Oh, right, because I just re-uploaded this file. So I gotta go back to the sheet name, go to new results file. This may...

no, this should be okay. Okay, so we're gonna run this puppy again. So we got the image URL here, so it's currently reading the image, and then it's also taking a caption, and it's telling us something about it.

And let's see what the result was. Your latest snap is epic, taking a big bite with those cool shades on. Okay, that's kind of lame. Let's just say using no frills tone of voice.

Let's run this puppy. Going all in on that burger, I see hope it tasted as good as it looks. So that's a pretty good output, which looks nice.

Now, there are a couple of different ideas here. You can use JSON, like I've used in the past. And so, you know, if I use JSON, what I'll do is I'll say something like, write in JSON format, and then I'll say example, and then I'll have it like output JSON, and then I'll parse the JSON.

Usually I'll do that when I have it outputting multiple things. So if it's outputting, I don't know, like an icebreaker, but then like a reason, and then maybe like a heading and then a subheading or something. I'll just do it in JSON because it's really easy to index afterwards. In this case, all I'm really doing is I'm just generating a one-line icebreaker, so I don't really know if I need anything else.

But yeah, just to sort of walk you guys through what my thought process is there. It looks like there are quotes. that it generates.

The quotes seem syntactically important. I've provided quotes in the examples here just to show and kind of disambiguate it from the previous text. So I'll just leave the quotes for now. And then I'm just going to trim the quotes as necessary, or maybe just remove the quotes. And then, yeah, I think that's all that I need to do.

And then what I'm going to do is I'm going to go back into this Google Sheet. And then There's sort of a couple different ways that you can do this. You could hypothetically create a new Google Sheet called like Instagram Phantom Buster, you know, DM list or something like that. But I think that, you know, it's possible that anybody that I get to do this task, like a virtual assistant or whatever, can use this information in order to improve the quality of the message if they need to or something like that. So what I'm going to do is I'm just going to add a new column, and I'm going to add it all the way at the end of this.

And it's just going to be called like DM Icebreaker. And then I'm going to copy a template. that I've set up here to sell my service.

And then I'm just going to like make it very easy, copy and pasteable. I think if I could go back in time, I might do this on Airtable, but I know that not everybody here has access to Airtable. So maybe this is a better call.

Airtable is just a lot easier to build these formulas and stuff like that in it. And then, you know, it's also a lot more, I think, straightforward for virtual assistants and stuff, but that's okay. So for here, for this column, I'm just going to write DM. Maybe we'll go DM. We can't camel case DM.

I'll just say DM. And then I'm gonna go back to the integration in particular and then we're not gonna add a row I think we need to update a row or some shit like that. Let me see.

Yeah, it's probably update a row So first thing we got to do is we need to run this again Because we have a new structure here with a new New column I'm gonna connect this and then I need to reuse a different account. I think I just have so many of these I haven't really gone through and swapped them. And then what am I doing here? I'm looking for Instagram. No, this might be shared with me.

So we'll do Instagram. Okay, great. And then the sheet name is going to be whatever the new results file is, and then the row number in our case.

We're just going to take the row number in from the previous Google Sheet, and then what we're updating here is the DM. And it's going to be high and then we're going to grab the previous first name. The way that I'm going to do the first name is I'm just going, I mean, I'm going to do a really hacky here.

I'm just going to say full name. Then I'll split this based off the presence of space, I think. And then I'm going to get this first result of that.

And then I'm going to I guess DMs are usually in one line, so I don't actually need to do a new line. In order to keep the tokens, or the DM as short as possible, because I think there's like a DM character limit in Instagram. Let me see what this is. 1000 characters, that's actually pretty long.

We're probably not going to need this. I'm going to feed in this. We should just replace all of the quotes with empty string. That should work. So this will be our one line DM.

Let me just check if my one line DM is going to include like punctuation. So we got an exclamation point there, we got like a period here, and I got exclamation points. It's probably going to include punctuation. I thought you might be interested in X. We do.

I and offer Z. Let me know if this might be worth a chat. Okay, cool. So now I'm going to run this puppy again.

I'm going to test this out, make sure that I updated that row. I guess since it's all in one line, we don't really need to capitalize that first letter. But anyway, yeah, that seems pretty good.

And then what we can do is we can populate this Instagram Fanta Buster scraping sheet with just like a massive list of, you know, people and posts and that sort of thing. And we have a profile URL column here and a username. So we should be able to, you know, in the future, use this information to like index any other database that we set up with Instagram information. So like if it's a list of profiles, we can use this as like a primary key and then use that to sift through.

If it's anything else, we can. Make it pretty simple. And now like your SOP for a VA or something like that, you know, you might have, I don't know, another column here that's like status.

And then, you know, you might just be able to like do a check mark or something like that. But basically they just go through this one by one, copy it, and then paste it into Instagram DMs, send it, and then, you know, check it is done and then just move on to the next one and that sort of deal. And in this way, you know, instead of paying, so really there are a couple benefits to doing this sort of. approach if you guys are interested in how to systematize this more generally.

You know, like a lot of the time virtual assistants, you know, live in much lower cost quality living countries. So like in the Philippines, you know, where the average wage is maybe five or six dollars an hour, you can provide, well, actually I think it's substantially lower than five or six dollars an hour, but like the five or six dollars an hour is like the virtual average wage. So you can provide a significantly lower wage on an hourly basis.

But you also don't really have to suffer any of the drawbacks if you think about the typical drawbacks of offshoring, which are, you know, usually some type of like language incompatibility or, you know, a slight difference in culture or attitudes towards work. And so in this way, it's just a very simple, straightforward, streamlined process where you get to leverage AI to do like the customization aspect. And then all you have to do with the virtual assistance time is just sort of copy and paste and copy and paste.

Because, you know, we're using humans for this purpose. We don't necessarily have to suffer. anywhere near the same rate limit sorts of issues of Instagram, Word of the Tech that we were using bots.

And, you know, presumably if you're selling something via DM, it's probably some type of high ticket offer. Like a lot of people do coaching via DM or something like that. So the five or six dollars an hour might net you like several hundred DMs with a conversion rate of maybe like half a percent, one percent. You can usually at least convert a sales conversation or two for like four or five dollars.

And, you know, just for your guys understanding, being able to like generate a sale opportunity or a sale. or a booked meeting or whatever you want to call it for four or five dollars is like insane most businesses will spend uh you know anywhere between like 30 to 40 times that for qualified opportunities so yeah you know there's some issues with instagram dming for sure like instagram dming is uh i think you know you now lend in the requests pile so i'm not necessarily recommending that uh everybody drop what you're doing and then use instagram dms or anything like that i found a little bit of success with it for my own business a very tiny bit And then I've also built systems like this for other businesses that have found substantially more success, particularly the coaching niche. But yeah, I just wanted to give you guys sort of a sense of the order of magnitude value in a system like this, even one that's not fully automated and even one that the Instagram API has sort of tried to stamp down. Very, very potentially profitable. So the question now is, okay, I showed you how to do this manually.

How do you do this on an automatic basis? Well, there are a couple of caveats to this. So I think I mentioned this to you before.

because of the whole DM, Instagram, API thing. We sort of have like a little thing that we need to solve first before we can do this. This session cookie lasts a certain amount of time. And so what that means is in order for the system to run completely autonomously, you need to update the session cookie every so often. And everybody that's making videos on this stuff will not tell you this part because it looks really glamorous if you can just run a Phantom Buster loop over and over and over again, right?

But in reality, you know... Reality is always more complicated than fiction, right? So, in reality, you need to take a couple of additional steps to make sure that this works. And if you think about it, let's say that the Instagram session cookie expires after two hours.

That means that every two hours that you want this thing to run, you're going to need to make sure the session cookie is right. And then if it's not right, you're going to need to go and update it. How do you update it? Well, you can either manually go to this page and click connect to Instagram, or you can use like a third party automation tool, like a browser automation tool, like what I was mentioning earlier, to go and then log into your website or Instagram and then retrieve the session cookie from one of the cookies and then use that to update some resource. So why do I say this?

I say this because if I were building this out of practice, what I would do... is I would have a Google Sheet, and I would call this authentication. I would have a list of the profiles that you are going to be using to scrape. I would build it out, so maybe I'm using Mixer app to scrape.

I would then build it out so that we have a column here called session cookie. And then we also have some like date last accessed. So if I go back to my new results file, I think I'm just going to pull a random timestamp here.

Okay. What I'm going to be doing is at the beginning of every automatic execution of this Phantom Buster scenario, which we can do through make, I will look through the session cookie for the profile name that I'm using to access. And then I will attempt to run it using that session cookie.

And then if the session cookie is valid, it'll just continue down the flow. And if the session cookie is invalid, what it'll do is there'll be some error handle. And then using that error handle, I'll run another scenario.

That other scenario will be for my browser automation software, which will then go out onto Instagram.com, you know, spin up a web page, try and log in. And then when it logs in, it's not even going to do anything. It's just going to like log in and then pull the session cookie and use it to update this.

It's then going to call back to the original Instagram scraper. And then it's going to run the Phantom Buster module again with this session cookie. That's sort of how all this looks from a high level, at least on the make.com side of things.

Now if you want to do this in practice what you do is you go down to Phantom Buster here and then you would have a module called launch a phantom. And this is gonna be like a multi-scenario sort of deal so you'd have like one called launch a phantom. If I go to map you'll see there's Instagram photo extractor. I believe this is the one that I was using.

You'll see that one of the fields here is session cookie, another spreadsheet URL, profile URLs, column name, number of posts per profile, number of profiles per launch, CSV name, right? This is like the output file. And so, you know, what we would do is we'd have a Google Sheet.

We can just copy this search rows. We'd make this the trigger. We'd run this on some schedule. So maybe, I don't know, every 60 minutes, let's just do 120 minutes. We'd connect this.

And then if you remember, I added a new page here called authentication. Now I only have one element here, which makes it really easy, but you may have several profiles that you're going to be using. Anyway, we're going to run this puppy, and then you see there's a big session cookie. So what we do is we'd feed in that session cookie.

I have to reconnect this, and then I have to cancel this, and then open it again. Make can be sort of finicky with this sort of thing. with this sort of thing, so just keep that in mind. Then I'm going to run a YouTube Instagram photo extractor, go down to session cookie, feed in the session cookie.

If we had a big spreadsheet URL of profiles, then I would use that. For the purposes of this demonstration, I can just use this profile URL and it should function similarly. The column name only applies if you're using spreadsheet URL and the number of posts per profile, we're just gonna do one.

Number of profiles per launch, we're gonna do one. CSV name, I'm just not gonna do anything. And then I'm just gonna run this to show you guys how it works.

You can only get first post. Oh, that's interesting. I guess they have an additional option via the API that you can do.

There's watcher mode, filter results. You can add some filters here. Save argument, manual launch.

That's kind of funny. Whatever, we don't need that. Okay, so it's gone through, and then it's gotten our authentication, and then it's ran the phantom.

But as you can see, the output, it didn't provide anything in the output. And so the reason why they do this is phantoms take a certain... amount of time to execute, right? So we don't actually know whether or not the phantom is going to be done. Like we can't just wait for it to finish because it may be like five minutes, maybe like 40 minutes, depending on how many profiles you want to go through.

So they sort of separate into a two-step design pattern. They have the first scenario that sort of like launches the phantom, and then they have a bunch of other modules where you can create new scenarios that watch an output. And so notice how I can't actually connect this to my flow because this is a trigger only. So basically what we have to do in order to make this scenario work is we need to make it multi-step.

So number one is we need to launch Phantom Buster Instagram scraper. Just gonna save this. I'll go back to my example build and I'm just gonna create a new one.

This sort of two-step design process just in case you guys are interested in developing these sorts of things More often you use this basically every time you use like a scraping or a browser automation application just because again There's like a variable amount of time between when you'll launch something and then when you'll be able to like pull its output So extremely common Yeah, so we're gonna go back to example builds here. We have launch phantom Buster Instagram scraper I'm gonna create a new scenario and then I'm going to say Watch output of Phantom Buster Instagram scraper. I'm just going to copy all this stuff in here. I think there's something preventing me from clipping.

Continue allowing this. This is weird. I don't know why this isn't allowing me to paste. Oh, it might be because I have two things that I'm copying and pasting. Yeah, there you go.

I was trying to copy and paste the title as well. Anywho, then we are going to feed in this watch and output. Now, notice what happens when you connect this.

You just have the phantom ID, and that's all you have to put in. All you're doing is you're just watching to see if that runs. You'll also notice that this is not ACID.

What that means is, or this is an instant and ACID. What that means is this actually just has to pull every certain amount of time to check whether or not your phantom buster scrape has concluded. So that's pretty annoying. It usually leads to much higher token usage.

I want this. means in practice, is I'd run this at the same interval that you're running the first scraper. So, you know, if this is set up at 120 minutes, you know, I'd set mine up at 120 minutes too, but I try and like do it a few minutes later so that, you know, if I know that the thing's going to finish at like 120 minutes, I'd do it here. You can also use webhooks, but, you know, in order for that to happen, you would need to know how long it takes.

If you wanted to use webhooks, you'd make an HTTP request down here. Connect this. And then on this end, you would use, actually, I think you'd use Phantom Buster get an output. Now that I'm thinking about it, get a Phantom, download a result most likely, or get an output, yeah.

Yeah, okay, I take that back. We actually don't need to use watch an output. We can just use get an output.

And what we can do is we can use webhook here, set up our own. And then we can create it so that it's phantom buster Instagram scrape completed. And we can actually call this webhook from our other scenario, which is pretty neat. And then when we receive that request, then we can use that to trigger the get an output module. And what we'll do in order to feed this in is we'll feed in the, well, actually, we don't even need the phantom ID because we know what we want to get.

We want to get the Instagram photo extractor. And then obviously we're not going to call this immediately. We need some type of sleep or some wait or something. I wish there was a way to scale this up, but you know you could just use like a sleep rule of thumb where maybe you multiply the number of rows by some number of seconds or whatnot and that might be like a good proxy for how long it's going to take. In our case I'm just going to use a flat 60 second wait time and that should be good because I'm just doing an example on one.

So now I'm going to make a request. Okay, what am I doing? I'm calling this URL. Fantastic. This is the webhook, so I'm triggering it after 60 seconds.

Then I'm going to catch this, and then I'm going to feed it into Phantom Buster, get an output. Then from Phantom Buster, get an output. We're going to actually, why don't we just run this first and see how this goes. So I'm going to run this, and then...

I'm just going to change the CSV name because I know that this is going to sort of run it uh, anew. Okay great, so I just triggered this and now we're sleeping for 60 seconds. I'm going to go back to my home page here and you'll see this is actually running now. Now because we're only doing a single page it should run fairly quickly. It's probably not going to take 60 seconds, probably going to take 10, but you know I just want it to be really safe.

Okay great, it says one post extracted and then the file is just result.csv. Oh, maybe I need to refresh this. Looks like it's caching results or something like that, so that's all good.

Anywho, looks like this is, I don't know, 15 seconds, 30 seconds? I don't actually know exactly what proportion of the, you know, I don't know if this is the 60, no, I don't think so. I think this is probably 60, so it probably goes down to one quarter.

Okay, cool. So we're about halfway done. What we're going to do next is we're going to catch that webhook and then use it to trigger the Phantom Buster output.

I love screwing around with these much more sophisticated APIs, if I'm being completely honest. Well, not APIs, but services like Instagram or TikTok or whatever, because there are so many safeguards that they put in to prevent you from being able to access their information, that in order to do so successfully, you need to develop these very creative strategies and applications. And then when you do, the results are very outsized because nobody else has the ability to execute at that level.

So very, very cool. Anyway, what ended up happening is we got an output. Looks like there was sort of like a text output where it's like container whatever.

Yeah, so it just gave us what looks like a log, which is nice. Then there's a container ID where you get the output. We should be able to download a results file. I don't know exactly where the results file would be.

Yeah, we might just, we might just do that. So I'm just gonna Go to container ID and then I'm just going to feed it in manually for now just to verify what the format is of the output. Okay, it's a buffer. So we need to convert this buffer into text.

So I think we need to do like download a file or something. I've since completely forgotten how to download the file. So maybe we'll try HTTP. So this is getting a file from a given URL, but we need to convert this binary back into JSON.

Maybe, is there like a parse JSON module? We parse binary? Let me see here.

Maybe one of these, but I'm not entirely sure if I'm honest. This is a binary string so I doubt it. Let me check to see if maybe, you know, it's just any object. Yeah, I had a flow here that was pretty similar. I've just forgotten how to dump this in a format that works.

Let me just check to see if there's any... I remember it was a... There's a download module of some kind.

So yeah, this is an example of a similar system that was built in Airtable. Oh, okay. I guess you could just parse. Yeah.

You know what? I guess you could just parse it. So we'll just drop parse here for the purposes of this.

Let me just check to see how long the output bundle is. I think that's probably going to be okay. I'm not entirely sure, so we're just going to run this puppy with this JSON string. No, source is not valid JSON. We're just gonna feed this in to do some manual testing.

Then I'm gonna feed in the data field here. Just because I think I was misformatted? Yeah I was misformatted. Okay great so now I have the result object.

Looks like it parsed it but it didn't parse it. Didn't parse it in JSON which is interesting. I wonder if I do this. if that would be enough.

Let's run this puppy again and ignore the hell out of these warnings. Okay, uh, no. It did not work.

We may just have to parse this twice, honestly. Let me see what happens if I feed this same thing in afterwards, because make doesn't really, or Phantom Buster doesn't really like strings. Okay, we'll feed in 10.result object, not 10.object.

Yeah, okay, now we have access to everything. So it's sort of annoying that we have to parse this twice. I remember I came up with a solution that made it so we didn't have to do this, but whatever.

Anyway, so now we have a bunch of, you know... values here that we could use obviously for this. And we can use this to update the Google Sheet.

I mean now that we have access to all this stuff, I don't even know if we need to. Well, yeah, we should probably dump this into a Google Sheet regardless. So why don't we just run the OpenAI thing first?

We'll do the image URL, which is right here. And then instead of updating a URL, we're going to add the URL or add the thing to the sheet entirely. And you know what?

I have to go and I have to do the annoying sheets authentication again. So we'll add a new row here. We will go through our shared with me, I believe.

And then it'll be Instagram scrape. Yeah. And then we're just going to basically split.

Okay, actually, we shouldn't call this new results file either. We should call this profiles. Let's do that.

Yeah, we'll go over here. And then we're gonna have to refresh this because we just changed the name of the sheet. And if you don't change the name of the sheet in the Google Sheets module, then the API call will be misformatted. And then we'll just go through and then we'll enumerate over all of these. The location, we'll do location ID, we'll do pub date, like by, sidecar type, caption, profile URL, username, image URL, post ID, query, timestamp, and then the DM will just be the result.

And you know what? No, it's not just going to be the result. Oh, shit. I think I, oh, actually, you know, I could have just used this.

What the hell? I'm silly man. Okay, we're gonna get the full name.

The result here is this result. Okay, great, that should be good. Yeah, that should be fine.

Now we're gonna do this, we're gonna test this just once on this flow, just make sure this works. We should add a new row to our sheet with basically the same information except from this. It seems you're enjoying your burger in your latest photo. Yeah, so, you know, the copy isn't the best in the whole wide world, but I still think that it's reasonable, and that's probably more than enough, if I'm being honest. I've seen much shittier campaigns run extremely, extremely well.

Now that we have this, obviously we need a way to do multiple profiles. And probably the way that I would do this is I would set up another sheet, and then I'd say to scrape, and then inside of to scrape, I would do a Phantom Buster run here. If I go back to this dashboard, there should be a way to, let's just delete all this stuff.

Annoying. It'll make a new one and we'll call it Instagram follower collector. We use this Phantom. I'm just going to do this as an example first so you guys can see. So now we're going to actually go out and get a giant list of followers for people.

So Mark. He's Brownlee Instagram. I'm just going to use this as an example. It is unsurprisingly MKBHD.

For this purpose, I'm going to just do 500. No, actually, let's just do 100. You'll see that it allows you to extract a large number of followers. You can do 5,000 and 9,000 every 15 minutes. So you know, if you do the math, what's 24 times 496 times 5,000, let's say you probably do like 400k, which is pretty huge. And then number of profiles to process per launch.

You can have it rerun over and over and over and over again. Um Yeah, we're not going to do that because this is uh specific to a google sheet and then we're good And again, i'm also going to run this puppy manually. Let me rename this so it's simpler We'll call this bulk instagram follower collector and then give this thing a run And yeah, as I mentioned it's going to like extract a big list of followers that's going to do then pretty quick because basically what it's doing is it's going here, it's clicking followers, and then it's just scrolling through really really fast.

So pretty straightforward. But anyway the result is you get this file here with a bunch of profile URLs, which is nice. And so what we can do then is we can download this as a CSV, and go to our to scrape. We can import this just so we understand sort of like the data format. And then I'm just going to click replace.

I'm not going to replace spreadsheet. I'm going to append to current sheet. That way I don't replace the name and everything like that.

And, you know, one thing that you're probably noticing if you're still with me is, you know, when I'm doing these designs, I'm sort of starting at the end and then I'm working my way back. So what was the end that we wanted? Well, we wanted the Instagram information. Sorry, we wanted the AI generated DM.

And then, you know, we start with that and then we sort of reverse engineer everything up the queue. So from that, I knew, OK, well, in order to get that information, I need a single profile. OK, well, in order to get a bunch of profiles, I need to have a list somewhere that I'm referencing. In order to do that, I need to store a Google sheet.

Right. Logistically, this is like first principles thinking, and it's probably like the number one thing to lean on, I would say, when you're building these sorts of systems out. It looks like you also get the profile URL of some of these people, which is pretty cool.

Sorry, the profile image URL, which is pretty cool. It's pretty blurry. I don't know if this is going to be enough for AI to do anything. Yeah, probably not.

I mean, it's just a picture of a tree, and it's like 160 pixels by 160 pixels. Regardless, from here now we get the username, and then we also get the profile URL. And if you remember...

The one thing that we need in order to query a profile is we just need the profile URL. So we have the profile URL, the username, we have the full name. Not all of these have full names, so we probably need to do some type of exception for if they don't have full names.

It also looks like not all of them are like first last like me. I'm clearly a square because I call myself Nick Sarayaf. These people are Karthigood, you know, Incredible India.

That's a hell of a name. May I have a jar of Americano, please? Sure, my love. Yeah, so that's kind of annoying.

We could set a character limit if we wanted to. That's probably like a simple and easy way to do this at scale. So we could say, you know, over whatever, maybe like 10. If the result of splitting and then getting the first thing is over 10, then we do not want to use that. We just want to like have an empty space. So it's just as high.

Instead of high, yeshota underscore meharai underscore 1234. That's probably what I'd do. But anyway. Okay, great. So yeah, so this is us launching Scraper for a single sheet, I think.

So what we need to do now is we need to have another scenario. Let's call this like three. Let's call this like four.

We'll call these individual Phantom Buster Scrapers. Then we're going to make... a new scenario, actually two, probably two new scenarios now I'm thinking about it.

We need to make a Google Sheet. We might need to do one, we might need to do two. So what we need to do here is I'm going to say launch bulk Phantom Buster Instagram scraper, and then I'm going to copy some of the stuff just because we have some authentication sort of pre-created for us. And just be careful you're not also copying the name. because essentially what happens when you copy a module under the hood is you're copying the JSON of that flow.

Then if you add a name to it, the JSON breaks and Make doesn't allow you to paste it in. But okay, great. What we're going to do here now is we know that we don't want to like just, we don't need to reference a Google Sheet.

Actually, we do need to reference a Google Sheet. What am I saying? We need to get the authentication or the cookie.

So I'm going to paste this in. This is going to be my trigger. You know, you might run this, I don't know how often you want to run it, depends on how many, how big you're going to have the script be.

I think I set mine to 100 just as an example, so we're not going to do it very often. So maybe this would be like, you know, run, I don't know, once a day or something like that. Maybe it runs like in the morning and then we just go through the list of results later on.

Basically what you do is you'd search, you'd get the session cookie, you'd then launch the phantom here and you'd pull the session cookie from this. The profile URLs, you can either store that information somewhere else or if you're like me and you're just using Marques, I really don't know how to say that name, Marquis Brownlee, you would feed that in here. Actually, we need to make sure we're launching the right Phantom here. I think I'm using the wrong one. So let me go to the right one.

We're doing bulk Instagram follower collector. That's right. Okay, so Session Cookie would be here. We don't need a spreadsheet URL.

Oh, it doesn't actually just allow me to do one? Hmm, that's interesting. I don't know why it doesn't just allow me to do a single one.

I wonder what happens if I just feed this into a spreadsheet URL, or maybe queries. My intuition is that this would work, but we should probably try this out, because my intuition has been known to be completely fucking wrong before. We need a session cookie too, so why don't I go back to authentication and then steal this session cookie.

Let's run this puppy. Let's go back here and just see if the run is okay or if we really need a Google Sheet. If we need a Google Sheet, then that's really annoying. Okay, yeah, no, I was right.

Just because of, like, Phantom, but, like, a lot of the time what will happen with these scraping services is they'll be developed by people without a very strong understanding of the English language. And so, you know, it's just a little bit more annoying for them to pick up on, like, field names and stuff like that. I've found that there's usually like a little bit of give, I guess, in like the titles of some of these fields. And what happened in this specific scenario is instead of calling this, I don't know, input or something like that, they called it spreadsheet URL. And so you think that you need to supply a Google spreadsheet.

Instead, you can just supply like the URL of Instagram. That's cool, man. If I were working in French or whatever, I'd probably have no fucking idea what I was doing. So completely understandable.

But anyway, in our case, I just sort of thought back to, hey, you know, inside of Phantom Buster you don't just need a spreadsheet URL, you can also just supply a link. And so kind of what's happening under the hood, I imagine, is they have sort of options. They're like, if it's a Google Sheet, then I want you to run the Google Sheet scraper. If it's an Instagram profile, then I only want you to scrape that one Instagram profile using our Instagram profile scraper. And that's sort of what happened here.

So yeah, anyway, so we can run this. We can then sleep 60. And then once we've launched the Phantom, we're going to have the same design pattern that we had before, where we get the results, basically, and then we use the results to update some sheet. So I'm going to create a second scenario here.

This is going to be, let's see, watch output, watch output of bulk Phantom Buster Instagram scraper. What we're going to need to do here is we're going to need to set up a webhook again. This will be a custom webhook.

Drop that puppy in and then we're going to create a new one called bulk Instagram phantom buster scraper. Let me just be correct in my formatting and terminology here. And then we need to feed this in as the URL that we're going to be calling.

This is probably going to take a little while longer, so the sleep 60 probably makes sense. And basically the flow that's going to happen is, you know, every, I don't know, once a day at 8.37am, we are now going to call our Google Sheet. We're going to use that to pull our session cookie. We're then going to launch the bulk Phantom Buster scraper on MKBHD's profile. I'm using this as a static resource, but you can imagine how in the Google Sheet you might have another column or another sheet called, like, profiles to scrape or something.

In our case, I'm just doing this because Marcus Brownlee probably has more than enough followers to keep us occupied for several lifetimes. And then, you know, before my microphone runs out of battery here, you know, then I'm sleeping for 60 seconds. I'm then making an HTTP request to my next step in my scenario or in my flow. And that next step is the second scenario, which is now getting that webhook and then it's using it to download the result. Now, what I want to do is I want to select the name of the container.

The name of the container is going to be Bulk Instagram Follower Collector. So I'm now going to download this. It looks like you can, containers created on March 30, oh I get it.

So every time you run it I guess there's a new container. So what we need to do actually is we need to pass the name of the container in the in the call here. You could just say container underscore id equals and then click that in.

That'll work. Then we can just grab the the container ID here will be container underscore ID. Okay, great So now we select this we download the result will parse the JSON a couple of times So that'll be the first parse and then this will be the second parse I don't know if it's called a result object or whatever It'll be called but then all we'll do is we'll go into our Google sheets and then we'll just add We'll add rows You can do this. I mean like because of the way that this is set up we're probably going to have to do this one at a time but there are ways that you can add like a thousand rows or a hundred rows to a google sheet without consuming 100 operations that's just sort of like above the uh or sort of beyond the the purposes of this video so i'm not going to cover that right now but yeah now we have what's going to happen is we're going to have like a big list of profiles and then this is way more work than i thought that it would be to delete all these things jesus We're going to have a big list of profiles over...

Sorry, a big list of profiles under to scrape here with profile URLs. And then, you know, once every two hours or something, we're going to go through that profile list. And then we're going to scrape every profile in that list using the individual scraper, which is over here. And then we'll again launch another phantom to do the individual side. We're sleeping 60 here, but we don't necessarily need to.

And then we are going to make a request to our other scenario. to actually go out and then download all of those results from this. What we're then going to do is I'm going to feed in this container ID to the download result module.

We're then going to parse it a couple times, then we're going to feed it into AI, and then at the very end we're going to add it to our Google Sheet row under a sheet called Profiles where there is also a status column and then there's like a DM column. that they can just copy and paste into the DM. And this is sort of like how you automate several hundred hours of work, I would imagine, per month, to actually go and do the scraping and that sort of thing. And then in doing so save you a lot of money that you can then just have a VA or somebody like that actually be responsible for the sending. So that's sort of my idea behind this.

Obviously there's still a little bit more that needs to be fleshed out, so let me just make sure that all this looks good. So we're going to call the container ID here. We're gonna run this and then you know we're gonna wait 60 seconds here so let me see how much I can do in 60 seconds.

I think we're doing sheets here. What we actually need to do is we need to like add a sheet. Yeah so I need to add a sheet.

I don't think I'll be able to do this before. runs, so I'm just going to set a block here. Well, I guess it looks about halfway done. Basically what we're going to do is every time that this runs, we're going to add a new sheet.

The sheet is going to be called to scrape. That's where we're going to generate that list of a hundred profiles. And then after we're done scraping them, we're going to delete the sheet and then just regenerate it every time.

That's probably the simplest way to do this at scale. Sort of annoying to have to create a new sheet every time, but that's okay. Okay, great.

I blocked this. So what do we do here? We got the phantom. We made the request to the other webhook. We got it with the container ID.

We then download the result. Now because this is multiple results, I don't know if the data or something was messed up, so I'm not really sure what's going on there. File name is result.json.

Let's just get this container ID and then do this manually. The fact that it's empty is weird because... oh, maybe the data on the Phantom Buster side of things is empty because I'm running the same search twice. Fuck, I hate when that happens. Yeah, input already processed, so that's pretty annoying.

I think there's a way to disable this setting so that every time you run it it's new. Let me just see... Yeah, this is watch remote, only look for new followers. So no, we don't want that.

If you rename your results file at any point between launches, the Phantom will create a new results file. So I think the way to do this is we just, you know, we just run this every single time, or we'd add a different name every single time, just to make sure that every time it runs, it's running off new data. Alternatively, we could just set You know, we could just say if this isn't new and if the results file is null, then yeah, actually, that's probably what we should do.

We should say if length of buffer is less than whatever, then don't proceed because this isn't like actually doing a new scrape. It's not actually dumping any new followers in. That's probably what we would do in reality. However, I don't really care about that. because I still want this, like I still want to get the data of the last container.

So let me see if I could just do this manually so we can test this for now. And then we can use a full run later. This is the bulk Instagram follower collector. And then there should be a couple of runs.

We ran it recently, 45. And then we also ran it at 35. So let me just check to see if this is short. No, this is pretty nice and long. So we're going to run that puppy. And then what do we get? We get a list of profile URLs.

Great. What do we get here? We parse that and then we get a bunch of bundles.

Great. So every one of these bundles is profiles and we can... We could just add... Yeah, okay, so we're gonna have to... Okay, what we should do is we should actually add the sheet before we do any of this, and then we should reference that previous sheet, and then use this to add profiles.

So I'm going to move this over here. We're not going to have a block anymore. We're going to create create to scrape sheet, and add this puppy in.

This is going to be the same sheet, I believe, over and over and over again. So we should be able just to feed the spreadsheet ID in. And then I guess we can just select from the list. It's probably going to be shared with me.

And we're probably going to need to re-authenticate. Okay, we'll do Instagram Fanta Buster Scrape. I'm going to call this To Scrape.

And we'll create a new one basically every time that this scenario runs. And then actually we should probably do this after we do the Phantom Buster download result. What am I doing? This way we should be able or maybe we even do it after this parse JSON.

Because that way we'll know whether or not like this scenario worked. And then if it didn't work it's not going to run. Then we can parse the second JSON afterwards.

And then okay after this we have a bunch of bundles. And so now we can just do add row. And then we have to unfortunately go through the rigmarole of selecting the spreadsheet ID again. It's probably the...

man, if I had five seconds for every time I tried to find a spreadsheet ID. Okay, and then the sheet name... um, it looks like we can only manually select the sheet name. That's kind of annoying. So we're just going to call this to scrape, and then the spreadsheet ID that we feed in here is going to be the same as this.

column range, I'm just going to say A to Z, and then I imagine the data is just going to be profile URL, username, full name, image URL, ID, is private, is verified, query, timestamp. Okay, this should be good. What I'm going to do as a test is I'm going to delete this.

We're probably not going to have a header now that I'm thinking about it. but that's okay. Then I'm going to run this on the same container.

We're going to parse the JSON. If this JSON is a long string, it'll continue. Otherwise, it'll error out, which is nice. We could have another check. We could say does not exist.

Actually, that's not going to be correct. Let's just do length of whatever is greater than. Let's just say 100 characters. That's a good hack for now. Then to scrape sheet, we'll create the sheet called to scrape, we'll parse the rest of JSON, and then we'll iterate through every bundle and then add it to the sheet.

That looks pretty good. Okay, great. Let's run this puppy, see what happens. So it's now adding to the Google sheet. If we go to to scrape, you'll see that it's populating it quite nicely.

reasonably quickly I would say. We set it to 50 I think so we're only doing 50. Awesome so that part of the flow looks good assuming that the container that we get is the right one then yeah everything should work there and then what we want to do is we want to we can't select any of the stuff manually now we have to do it all automatic like using ids so let me just see what the spreadsheet id is here. The spreadsheet here says Instagram Phantom Buster scraping, so we're going to need to do this manually. Yeah, this looks to be the same sheet, which is nice. The sheet name is gonna be 2scrape because we just created one called 2scrape.

Awesome, it's gonna run and then I'm just gonna run this and we see a bunch of results here, which is nice. Now actually, I don't even know if we need this anymore to be honest because we can just feed in the spreadsheet URL. And then the, I wonder what the Google Sheets spreadsheet URL here is going to be. Well, it needs to be public.

So we can probably just feed this in, right? We do that and then this is going to be the spreadsheet ID. We're not going to feed in profile URLs anymore. We're going to do one post per profile, one profile per launch.

Actually no, we don't want to do one profile per launch, we want to do more. Then we have, I'm just going to empty all of this, sleep 60. We're probably going to do more than 60. So for testing purposes, why don't we, you know, we got 50 here. But for testing purposes, why don't we just do like five? You know, anything that works with five should work with 50. And then we may need headers. This is really my only question.

We may need headers. I don't know. If we need headers, we're screwed. I think we're going to need headers because then we have to go through and then we have to like add the sheet.

add a line first with the headers and then we got to go and do everything which is super freaking annoying. The URL of a Google sheet containing a list of... yeah.

Oh okay maybe we don't actually need this. Maybe we actually don't need anything except for the profile URLs actually. Yeah we might only need this. Okay yeah yeah I actually feel a little bit more confident about this now.

I feel more competent. Probably should, huh? Okay, let's do this.

Sleep, we're not going to sleep 60. We're probably going to need to sleep substantially more than that. And this is where you start getting into the question of like, should I use a third party service to store my hooks, right? Like I'm not just doing this for one profile anymore.

I'm doing this for like 100 or 500 or 5000. You know, the amount of time that it takes to scrape this is going to be so variable that, you know, I feel like it would be simple. Sorry, it would be ludicrous for me to try and do it on a simple You know flat wait time. Okay.

Anyway, so we fed it in the to scrape What I'm gonna do is I'm just gonna run this puppy and then I'm just going to see how the phantom Buster run looks For the individual scraper this one here the photo extractor I should say so we're starting here and then I'm just gonna run this Yeah, so actually we don't need to do this at all. We don't need to feed this in even a tiny bit. Oh, right, right, right. Yeah. So what we can do is we can just click this and then just run this with the ID of the spreadsheet of interest and we can just hard code that ID in.

Oh yeah, sorry, I lied. We actually do need that because we need the session cookie, don't we? I lied, I lied. Okay, we'll go back here and we're not going to do any of that. We're just going to scrape authentication and then we're going to grab the session cookie.

And the syntax of this is one. So I think I can just go to 11.1 and that should be the same thing. Excellent. Okay, so I should say this is maybe scraping right now.

I don't know. Yeah, missing cookie because we didn't supply one. So let's run this again.

with the correct cookie. Let's go back here. It's running, which is nice.

I love how you get like instant feedback with this tool. We're now successfully authenticating. It's now going out and you know it's pulling in photos for each of these, I would think.

It may be running still or it may... have fed in just my profile instead of the Google Sheet. Let's see what happened here.

So we fed in spreadsheet URL, we fed in number of posts per profile one. Yeah, that should be working. Let's go back here now that we don't have my profile hard coded in and run it again and see what happens. Okay.

Saying that it's running? It's saying the input is already processed, though. Hmm.

Let's try hard coding this, maybe? See how that works? Okay. Yeah.

So that seems to be working okay. Uh, no. Seems like it. is getting no posts from these profiles.

It's possible that none of these profiles have any posts, but I consider that unlikely. Oh, right, because they're private. Not all of them are going to be private, though.

Yeah. Yeah, I mentioned that this works primarily with non... Like, you know, the profiles need to be public, of course. In experience, probably about, like, 30% of profiles are public, so the other 70% are not going to be. But that's okay because, you know, we're working with such big numbers here that it doesn't really matter if they're public or private or whatever.

You know, if you're running 10,000 followers or whatever, you still have 3,000 to go with. It just looks like none of the ones that I supplied back there had a public profile, which is sort of annoying. Yeah, that's freaking annoying, man.

Maybe we'll do Mixerif. I don't know. Do I have another profile? I think I have another profile.

I don't know for sure if that one is... I'm not entirely sure. Maybe I should just source all of the people that have DMed me on Instagram and just use you guys.

That'd be pretty fun. No, I don't think I'm going to do that. We'll just put these two in for testing purposes and then I'll just run this anyway. And I think one of those might be accessible, maybe not both.

So let's run this again. We're then going to go back over here. Mmm, it's saying input already processed I think because this is just me my name So I'm going to define a different output name. I'm just gonna call this X.

We're going to run this again I'm gonna connect that to my sleep 60. So it's ran this it's now running and getting a new input successful authentication Looks like we got a couple of files here, which is nice. Okay, and now we're waiting and what we're going to do on this end is, this is number three, right? So now we're going to turn on number four, and then number four is going to catch that output, get the output, it's going to download the result like we did before, do parsing, and then it's going to dump it into the OpenAI module to get that like custom DM.

And then, yeah, I believe that's it. I think that's all we need to do. Clearly, it works with the individual URLs. Let me just double-check, see if there's anything else.

Oh, the times. We should make sure the scheduling is correct. Yes.

So the bulk Instagram Phantom Buster Scraper, let's just say it runs at like 8 a.m. These phantoms, yeah, I mean, I've hard-coded in... like the name of the profile, but as I mentioned earlier you can do it elsewhere. We're scraping a very fixed number of profiles with fixed wait times, which is unfortunate, but obviously that'll still work. We're checking kind of hackily to see if the results object is just greater than 100 characters.

If it's empty then it'll just be less than 100 characters, probably be like here or something. So that's just another sort of hacky solution, but you know obviously it works. We've been creating the two-script sheet.

Oh, we're currently dumping in all of the information. I don't think we need it. We can just do, oops, we can do just the profile URLs clearly, because that's the only thing that matters.

And then I'm going to allow you to download this blueprint. There's the authentication there. The spreadsheet URL is obviously hard-coded to be my spreadsheet URL. So, you know. If you guys are screwing around with this in the blueprint, just keep that in mind.

I'm adding a fake CSV name just called X here, but that's okay. And then I'm calling the container ID. Oh, and then it looks like we had an error, but I'll cover that in a second. Then we're catching, and then we are getting the output of this, and it looks like there is an output.

Then downloading the results, we are parsing this. It looks like the object that we got was no posts found. So, oh shit, I think we're pulling the old container.

That's annoying, but presumably that will not run when you have the new result. And then there are three queries. One says no posts found. Actually, all of them say no posts found. One just says post URL for some reason.

And then when we feed it into OpenAI, obviously it's not going to work because, you know, we're not feeding it an image URL. So yeah, there's that. But I know the reason why this is happening.

The reason why this is happening is because if you think about it logically... The bulk run that we ran at the very beginning, where we had the two scrape profiles, only includes nick.saraf and then nick.saraf here, which I think is not working. I thought I had an additional profile, but I don't think that I do.

And so because we've already pulled nick.saraf, like what, five times at this point, it doesn't want to continually do it unless we continually provide a new name, which I'm not doing. You can obviously continually provide a new name. You can just randomize a number and then just use the number as the name. or generate like a new UUID or something like that.

That's probably a good idea now that I'm thinking about it. Why don't we use that as the name? Yeah, we'll generate a new UUID for the CSV name, and then we'll also do this here.

That should fix this problem because the likelihood of like a hash collision or whatever, or a collision is so low as to be meaningless. Yeah, okay. Yeah, I mean, honestly, that should work now. We should probably put in some error handling so that if the JSON is empty and we run through this, we're not actually, you know, we might want to add like a little bit of a sleep here as well. So why don't we call this and then why don't we just add like a couple seconds sleep?

You don't have to do this, but if you are on a relatively low OpenAI plan, like I'm on like the highest plan, so I can technically do, I don't know. five trillion queries a minute or some shit like that. The number of tokens that can be permitted is honestly quite incredible.

But if you're on like the beginning plan or something then you'll probably run into rate limits unless you sleep a couple seconds between each of these. So I'm just going to say sleep two. And then we should have some type of error handle where if error is no post found then it just doesn't run and so it just doesn't convert that into the next row.

Yeah. But yeah, aside from that, you know, assuming that you're using good data and assuming that the followers that you're getting are obviously, you know, on like a profile that's big enough, you know, you can use my hard-coded approach or you can add some additional context if you'd like. But yeah, that's how you build an Instagram scraper.

What I'll do now is I'm going to download the blueprints of each of these scenarios and then I'm just going to put them up on my make.com for people who want to make real money doc. I'm also going to create a Google Drive which contains all of the blueprints that I've generated so far. Because a lovely follower of mine told me that, as it is right now, it's sort of unintuitive. You sort of have to open up a blueprint, which is then a bunch of JSON, and you gotta download it manually, which is stupid. So I'll try and make it as simple as humanly possible.

But in short, this is how you build an automated Instagram scraping machine. As I mentioned earlier, there are a couple of caveats. The session cookie issue is one that you can solve, and I've solved before for a bunch of people.

But you just need to develop a third-party browser automation to basically sign into Instagram or whatever service you want every X hours. So, I'm gonna go ahead and start this. So if there's interest there, then just let me know and I'm more than happy to record a video on it. Aside from that, yeah, I hope you guys enjoyed. That was pretty interesting.

And hopefully I showed you guys my thought process when designing these things. I didn't really expect it to be a four scenario flow, but just as I was developing it, you know, starting to put the pieces together, I didn't do any of the mapping ahead of time, which, you know, usually makes this a little bit easier. Just as I was doing that I was like, yeah, we're gonna need one to launch, we're gonna need one to watch, then we're gonna have like a two-step.

So we'll need to multiply all that by two. Anywho, yeah, thanks so much for watching. If you guys have any specific requests, just leave them down below. Otherwise, like, comment, subscribe, do all that fun stuff.

I will catch you all in the next video. Bye-bye.

Transcript for:Automating Instagram Outreach with AI

Transcript for:
Automating Instagram Outreach with AI