[Music] we keep talking about malware doing this training like so many times like we we cannot get enough of it well this time we're gonna talk about malware analysis and as you can probably remember we've said several times already that the times when antivirus signatures were more than enough to detect malware and viruses are long gone all right so how do we detect malware nowadays well first we do have to start somewhere we're gonna start with these classical signatures traditional signatures because we first have to look for the things that we know for sure that they're bad we need to look for those signatures that indicate known malware and makes it so easy for us to identify it in running processes and files on our disk even in network traffic but there's new malware created every single day every single hour and that type of malware is nowadays created specifically to avoid this type of signature detection so how do we keep up with this well we're going to start simple and we're going to start with a community effort it's very important that as many people as possible from all over the world try to to work together to identify new and emerging types of malware and we can start with simple websites like these like like virustotal where you can simply submit your own files in here and they're going to run it through a number of antivirus engines some of them will return a clear verdict like positive or negative some of them will be just a bit confused right but this type of submissions these kind of submissions actually help as a as from a community perspective because not just virustotal does this but most vendors out there especially the vendors that make a living out of uh creating and updating security products they always want you to to share to give them as much content as you can so that they have something to analyze they have real data that is circulating right now all over the internet that they can analyze right so this is the first step right contribute your own files contribute your own files that you might not be 100 sure that they are malicious or not and then also as a community effort also or as a specific effort for each and every commercial company out there we need to write some signatures we need to update our current signatures in order to keep up with the emerging types of malware and we do have some some methods to standardize or to generate some standard naming conventions so that everybody knows what type of malware we're talking about this is one of those standardized naming conventions you can see it dates back to 1991. it actually makes sense right malware has been around that long and we do have a type of naming convention that kind of looks like this like starts with the family name followed by the group name major and minor variant and a couple of modifiers here you can read about it this type of standard naming convention is still used nowadays another one here is maec or also pronounced mic this one here is also a way to standardize not just the naming of malware but also to make it sticks compliant remember sticks remember about six we talked about it a couple of videos ago if you don't go back and watch those again another one here is yara and this one is about not just standardizing the naming conventions but also for standardizing the detection rules or how to write those detection rules in a uniform manner you can see some examples right here like a very simple piece of code here that simply looks for a couple of strings right if any of a b and or c actually that's the condition right here shows up in a binary then well that's the proof that there's a certain type of malware present in there if you're interested in writing malware signatures there's also a very nice tutorial also available for free on the same website starting with a simple hello world here then followed by the basic structures and code blocks available in this language right like base type structures array dictionaries and such right it does require to have a bit of programming knowledge or at least a couple of programming concepts here because we're actually writing code that searches for specific patterns right so it's programming involved here but it's really high level all right so we're pretty much back to square one about signatures for each and every new threat out there we need to try to understand its behavior and for some unknown piece of malware sometimes the only solution is just to let it run and see what it does because otherwise we know nothing about it analyze that information and perhaps just perhaps you'll be able to collect enough information to create a new signature for it all right so this type of risky analysis where we just leave the malware do its thing has to be done in a controlled environment and we call this controlled environment a sandbox so sandboxing is a technique that usually relies on virtualization because it allows us to quickly spin up and destroy environments which creates an isolated virtual environment where the malware is actually launched also sometime this is called a detonation of that malware this allows you to observe that malware like through an experiment window and you can look for things like any changes to the system files right any changes to running processes running services on the network side either look for new listening ports being opened or new outbound connections being initiated those destinations actually might prove useful as well so don't just look for open ports uh have a look over what's on the other side as well where does that port connect to look for low-level system calls operating systems have access to the hardware through low-level system calls if the operating system tries to access the low-level hardware and the resources in an abnormal manner in a weird way that's going to be identified by intercepting those system calls look for events created by new files uh file changes file deletion file uh archiving followed by file deletion like a crypto locker for example anything that happens to the files on your on your disk during the execution of that malware also look for tasks that are being created inside of the task scheduler on windows or inside cron on linux look for tasks that are scheduled to run somewhere in the future some malware might use this type of delay technique to avoid some sandbox analysis methods and don't detonate immediately but at some time in the future now fortunately anti-malware solutions implemented inside sandboxes uh are pretty smart as well and they also have a technique called time acceleration sounds really sci-fi but in time acceleration or time warp to see how the malware would behave like tomorrow next week or next month so they're they're basically accelerating the time inside of that virtual machine just to see what the malware is about to do at some point in the future and of course there are a lot of commercial solutions out there based on sandboxing pretty much all the major security vendors create such solutions but you also have a free alternative like the kaku sandbox which you can freely download and run on your own machine and see how such a sandboxing solution actually behaves now since you're running in a virtual environment as i said the sandboxes are usually virtual machines those are theoretically well isolated environments but you will still have to be careful about potential hypervisor vulnerabilities they are not often encountered but some malware might attempt to exploit the hypervisor itself as you probably know the hypervisor is the piece of software that runs on top of the either the operating system or on top of a bare metal installation on a server and takes care of slicing hardware resources and allocating them to your virtual machines and also mediating the communication between the virtual machine and the actual physical hardware so it's basically just a minimal operating system which if compromised might allow a virtual machine to break free out of its confines and either in fact or attack other virtual machines or even start communicating with the outside world outside the the virtualized environment again there are not many uh let's say examples of hypervisor vulnerabilities that can be exploited but you never know and we do have to introduce the term of reverse engineering now reverse engineering in software is a process that attempts to analyze a program or an application in order to discover how it was built in the first place and eventually how it is supposed to work and what it is supposed to do now for malware this process is going to make a lot of sense because if we are able to deconstruct a piece of malware then we would be able to figure out exactly what the malware does without ever executing it without ever allowing it to run now unfortunately this process is not always easy to be performed sometimes it's basically impossible or returns a result that it's really tough to interpret by a human being but we do have some solutions to this and the first category of software that can help us with reverse engineering is called a decompiler ideally this process would allow you to start from an executable file and determine the high level source code that was used to build it like c plus or java or c sharp or whatever unfortunately this is not always possible for example java makes it just a bit easier due to the structure of the class files but c or c plus plus might give you nothing all right and some malware developers use code obfuscation techniques before compiling which makes the original code extremely difficult to read for humans which means basically you know randomizing uh function names uh variable names or just to make that code extremely hard to read and to follow pseudocode is also one of the potential results generated by a decompiler some decompilers are only able to generate pseudocode which is a way of describing just a program logic without the actual you know the functions of the variables and such just the program flow not in a specific programming language but it gives you the opportunity to see the generic decision statements you know the loops the assignments the communication with the outside world and so on that might be just the intermediary step that you require in order to figure out what the malware actually does a disassembler is another type of software that attempts to take a binary which is a machine code executable and disassemble it right convert it into assembly code assembly is a low-level programming language made up of individual cpu instructions it might be really hard to follow for humans because for any let's say basic operation that we see on our screen the cpu actually does a lot of stuff in the background well in an assembly language you get to see all that background business happening line by line and it might be just a bit difficult to follow one thing to remember here is that machine code can be used to properly identify file types for example much better than relying on file extensions which can be changed by anyone i mean nobody stops you from renaming a docx file into an mp3 right it's still going to be the same file basically the extension is just part of file name has nothing to do with the contents of the file hopefully you knew this already now each and every file actually has a sort of a binary header which is sometimes called the magic number and this number is used by anti-malware solutions so that they're not easily fooled by some fake file extensions this magic number actually identifies the actual contents of the file so this is where the operating system looks when it attempts to identify a file that can be executed so for example if i were to use this simple website here filesignatures.net if i were to search for either a signature or an extension like an exe extension i click submit it's going to tell me that for exe files which are windows dos executable files the signature looks like this for d5a in hexadecimal so it's basically two bytes long and whatever file starts with this signature right here i can be pretty sure that it is actually an executable file it doesn't matter what the file extension or the file name says also of course malware creators are smart as well and this header can be masked or faked by some smart malware but even if it is changed it has to be the correct one at the exact moment of file execution because that's when it matters and that's when you catch the jpeg file trying to execute itself and finally if this explanation here doesn't make it clear enough i would advise you to head over to this website and read this article about a comparison between decompilers and disassemblers another method for malware analysis is just looking for strings like character strings like words basically now looking for strings when analyzing malware might sound strange but it's actually very useful a string is a just a sequence of characters and static strings are often stored together in program code as one block as in strings are literally stored in the source code hard coded there and often they're stored just like that in the resulting executable file now if the sequence is made up of printable readable characters then we just might have something interesting there and they're also easier to find and all that binary gibberish now those strings might reveal things like variable names file names that malware is looking for trying to access process names that the malware is trying to attack that we can use to identify the malware in memory domain names and urls and that the malware is trying to contact for example to reach its command and control center right there is a strings utility on windows and on linux actually called strings and it kind of works like this so let's take a program a binary executable file like the date command right which simply prints the current date and time on the screen now if i were to identify where exactly is this date executable located it's going to tell me that it's located as slash band slash date so i can now provide this as an argument to the strings command so string slash ben slash date is going to tell me what are all the printable strings that it can find within the compiled executable date program and as you can see there's a lot of stuff in here like i don't know the names of the in the months and the days of the week and uh i don't know formats supported and error messages and i know manual information the reference actually the entire manual file is in here as you can see here that's basically the help output that the the date program can provide you if you provide it with a if you launch it with a a help parameter so there it is i did not decompile this this is basically still binary information but i can simply search for this information using the the strings utility try it out on a couple of your your programs on your computer you might be surprised what you're gonna find in there another concept here is called program packers now a packed program is very similar to some kind of a self-extracting archive remember those uh executable files created with winzip or winrar so in case of a packed program we have an executable that is compressed and also a small piece of code that is uncompressed because that's the code that actually extracts the rest of the program and when you double click that file the uncompressed file of the of the code is supposed to be executed now this practice can uh first reduce the file size and you can also protect some intellectual property in there also if you try to use the strings utility from before on such a such a compiled binary it's not going to work anymore in such a file because all the contents in there are not going to be in clear text anymore since they're all packed up in an archive but this technique is also used by malware and to make it more difficult to detect to match a signature or even to be deconstructed some malware even recompresses itself every single time it multiplies with a slightly different password slightly different algorithm or slightly different contents of the of the archive making it impossible to be matched by static file signatures so how do you analyze such a type of malware usually you detonate it inside of a sandbox let it unpack itself and then you start your analysis exploit techniques now an exploit technique is the actual method used by malware to infect a target so it's about answering the question how does the malware do it right and from the old days we still have viruses viruses usually in fact existing files on the hard drive so when we talk about viruses we're referring to some files being infected on our hard drives on our ssds and viruses do this by attaching some secondary executable code to those files on the disk and then waiting for someone to open that infected file now for most types of viruses we're going to fight them using static signatures in antivirus solutions we also have worms worms can run in memory before they attempt to use the network to propagate now generally a worm doesn't require user interaction so a worm is going to propagate itself over the network by looking for specific vulnerabilities that it can exploit so it is programmed to look for those vulnerabilities to perform some limited scanning look for a specific weakness and exploit it in order to jump from one host to the next and propagate itself we also call worm sometimes file as malware because they don't need to rely on files on the disk to exist or to multiply we also have signatures to identify them but we're gonna look for them in memory not on the disk now newer type of threads are also called fileless malware and they rely on some small piece of code executed by a script that is passed through a request or some other type of protocol now that small piece of code nowadays called a shell code remember this term for the exam often acts as a dropper or a or a downloader and it's purpose if you download another piece of code from the internet and it works that way because it's much easier to trick a user into clicking a link that encodes a very small piece of downloader code rather than tricking the user and the antivirus into explicitly downloading and running the actual malware file and now the malware that gets downloaded by this type of dropper is often used to maintain access and is some form of a rat a remote access trojan just like the name says a rat remember this for the exam as well allows the attacker to remotely access your machine and you know disguises itself so hence the the the term of a trojan now this is used to maintain access of course for from the attacker's perspective for future attempts so the attacker doesn't have to go through the hacking process every single time in order to to control your machine or to access your your confidential data of course nothing stops the rat to come with additional bonuses like a key logger that perhaps records all your keystrokes including your passwords your visited website and sends them to its master another way shellcode can run is by attempting to execute some payload uh that attempts code injection against a valid system process now a system process is usually the target for code injection because if it succeeds then the malware is able to run its own injected code with the permissions and the privileges and the level of access of that system process and this could allow the malware to perform things like restricted file system operations or or restricted operating system operations to get access to restricted memory areas restricted files or to move freely through a firewall and some methods for performing code injection are things like masquerading this is going to be about replacing usual application files executable files for example with a malicious one with a different one dll injections are also widespread and these can be done in a number of ways uh remember dll side loading for example or dll replacement this is going to replace a system dll file that gets loaded by some system process so it's enough for the malware to know that a specific application is going to load a specific dll file replace that file with a zone and then the application is going to load that code and merely execute it another one is phantom dll hijacking this is going to drop a dll in place of a missing one that application will uh attempt to locate and load for example if an application is looking to load all the drivers or plugins in a certain dislocation like for example i don't know on a an archive utility which has a plug-in directory with all the archive files that it supports and it has one dll file for each file type that's the library that instructs us how to create archives in that specific format now a malware can target that folder and add its own dll file in there and you know the application is going to load them all since that's the plugin folder and it's supposed to load everything it finds in there another one is process hollowing a very interesting one this one starts an empty ish process without any malware in it so that when it starts it doesn't trigger any antivirus alerts and then dynamically updates changes its code rewrites it with malware one more thing here code injection not only looks for vulnerable processes to perform privileged escalation but also for the anti-virus programs themselves or forensic tools or decompilers and attempts firstly to disable those so it can continue to work undisturbed another interesting technique here is called living off the land it's a more like a practice than a specific set of tools living off the land is malware behavior that relies on the existing tools on the infected machine in order to perform malicious tasks sometimes it's enough to be able to invoke a powershell instance or a bash or python or an infected machine to perform your tasks without having to download and run even a single piece of executable code all the operating systems have a lot of built-in tools which can be used for malicious purposes so that's living off the land so these are pretty much all the malware analysis techniques that you should really know about for the site supports exam so try to remember what type of tools we talked about try to remember some of the exploit techniques out there and if you found this informative don't forget to like and subscribe and see you next time when we'll be talking about the cloud see ya bye