XML What is it Good For?

This will be the first post in a series about penetration testing web services. This first post will be a quick review of what XML is and then go over a brief example of interacting with some XML data. While talking about XML may seem a bit elementary it does lend itself to making you think about ways to store and share data which is perfect for our purposes, also I wanted to start this series from the ground and work my way up. Here we go…

XML

EXtensible Markup Language(XML) is a markup language (it uses tags and looks a lot like HTML) used to describe data. What does that mean? Most applications of sufficient complexity have many instances in which they will need to save, transmit, or retrieve data. That data will need to be in some format that will make it so that when we look at it again later we know what it is. If you saved all of your TODO items in a text file called my-todo-08-2015.txt you will know exactly what is in there, but for an application we would need a way to describe what each item in the file is. As a simple comparison in HTML we have a bunch of predefined tags that tell a browser how to read our data. For instance, the h1 tag tells the browser that the text is a heading. Your browser knows every time it sees that tag to draw it a specific way. Some other things we might want for our method of storage is that it be readable by different browsers, applications, or platforms. The beauty of XML is that it does all these things and makes it so you can invent whatever tags you need to describe the data you are serializing.

Now we have talked about it for a bit let’s look at an example. Let’s say we have an application that tracks our meals. We might want to save our data like this:

<?xml version="1.0" encoding="UTF-8"?>
<meal>
    <food category='legumes'>
        <name>beans</name>
        <calories>30</calories>
        <fats>2</fats>
        <carbs>2</carbs>
        <protien>2</protien>
    </food>
    <food category='veggie'>
        <name>carrots</name>
        <calories>30</calories>
        <fats>2</fats>
        <carbs>2</carbs>
        <protien>2</protien>
    </food>
</meal>

Great Example - What Am I Looking At?

A few key things to point out in the example above.

The Root The Tree

The first line is just the declaration telling the world this is an XML document. Next up we have the root element ‘meal’. This is important and is a required part of every XML document. Indented inside of the meal tag is a food tag which in turn has it’s own sub elements (eg. name, calories, etc).

The Family

Another way to refer to these relationships is parents, children, and siblings. You can probably guess that in the above ‘food’ is a child of ‘meal’ and ‘name’ is a sibling of ‘calories’.

So What is an Element

Next up is elements. An element is everything from (including) the element’s start tag to (including) the element’s end tag. In our example above beans is an element. A few other terminology points:

  • The ‘food’ element has element contents, because it contains other elements.
  • The food element also has an attribute category=“.
  • The ‘name’, ‘calories’, ‘fats’, ‘carbs’, and ‘protein’ have text content because they contain text.
XML Namespaces

So XML is a useful way to store data and share it between applications, but not so great for displaying data. One issue we might run into is combining XML data from different places that are using the same tags in different ways. That’s where you might see namespaces. The definition from W3C:

XML Namespaces provide a method to avoid element name conflicts.

This isn’t always 100% intuitive by just reading the definition so lets look at an example. Please keep in mind that his example is a little oversimplified, but I think it will get the point across. Previously our example had a root element and a bunch of ‘food’ elements as children. Let’s say we want to look at multiple meals and add an element to reference individual foods. So we add an ‘id’ tag to to both.

<meals>
    <meal>
    <id></id>
        <food><id></id></food>
    </meal>
</meals>

Now if we reference the ‘id’ tag it is unclear if we are looking at the meal or the food. This is where namespaces come into play. We give our individual types a prefix using the ‘xmlns’ attribute in the start tag of an element. The attribute looks like this:

xmlns:prefix="URI"

Now a quick point here about the URI. It’s effective purpose for the document is only to make the name unique, not to provide a link for looking up data. That being said often you will find that the URI is in fact a pointer to a web page that has information about the namespace. Ok let’s look at an example of this in use for our silly meals tracker and ‘id’ tag.

<meals xmlns:m="http://www.example-meals.com/meal"
xmlns:f="http://www.example-meals.com/food">
    <m:meal>
        <m:id>1</m:id>
        <f:food category='legumes'>
             <f:id>12</f:id>
             <f:name>beans</f:name>
             <f:calories>30</f:calories>
             <f:fats>2</f:fats>
             <f:carbs>2</f:carbs>
             <f:protien>2</f:protien>
        </f:food>
    </m:meal>

In the root element ‘meals’ you see two ‘xmlns’ attributes for ’m’ and ‘f’.

One last little note about namespaces is the default namespace. If you see a single namespace defined within an element that will be the assumed namespace for all child elements.

    <food xmlns="http://www.example-meals.com/food">
        <name>beans</name>
        <calories>30</calories>
        <fats>2</fats>
        <carbs>2</carbs>
        <protein>2</protein>
    </food>

If my description is all screwed up please take a moment and review this great writup from informit for a more detailed description.

Parsing XML in Ruby

Now that we have a quick primer of XML lets take a look at parsing some XML. The end result of this blog series is to cover the key elements of web services testing so for this section I will be using a sample WSDL from http://www.w3.org/2001/04/wsws-proceedings/uche/wsdl.html. Also, I will be using Ruby and Nokogiri for the actual parsing.

A few quick notes before getting started. I won’t be going into specifics about a WSDL as I have already started writing a different post to cover that. Also, there are other libraries that are better for WSDL specific parsing. This is only intended to be a quick demo to illustrate some of the points talked about earlier in the post. Lastly, I won’t get into Xpath stuff here for the parsing just for the purpose of keeping this simple and short.

If you don’t have nokogiri installed you will need that first. Check out this site for the official set of nokogiri tutorials.

gem install nokogiri

I saved the sample WSDL content to my local disk as sample-wsdl.xml. Next step is run pry and to load up nokogiri.

pry(main)> require 'nokogiri'
=> true

Now let’s load our XML document.

pry(main)> f = File.open('sample-wsdl.xml')
pry(main)> doc = Nokogiri::XML(f)

Great, now we have loaded up our XML document and have tons of ways to interact with it via nokogiri. Since we just finished talking about namespaces lets start with something easy and take a look at the namespaces for this document.

pry(main)> doc.collect_namespaces
=> {"xmlns"=>"http://schemas.xmlsoap.org/wsdl/",
"xmlns:soap"=>"http://schemas.xmlsoap.org/wsdl/soap/",
"xmlns:esxsd"=>"http://schemas.snowboard-info.com/EndorsementSearch.xsd",
"xmlns:es"=>"http://www.snowboard-info.com/EndorsementSearch.wsdl"}

Above we see three namespaces: - soap - esxsd - es

Now when we see these prefixes on a tag we will have a better idea what they are and we could also search our document based on namespaces.

Nokogiri is a really neat gem and provides a lot of ways to interact with an XML document. Lets run through a few simple examples pertinent to our WSDL document. A great first step might be to get the root element from the document:

pry(main)> root = doc.root

Nokogiri essentially treats each node like a hash with various values, some of which are arrays. Let’s get the service name from the WSDL document using Ruby array notation.

[43] pry(main)> doc.root['name']
=> "EndorsementSearch"

In this command we are returning the value assigned to the key ‘name’ for the documents root element. If you take a look at the WSDL file you will see that name=“EndorsementSearch” is an attribute of the root element .

Another operation we also might want to do for a WSDL document is parse the message tags. We can do this by looping through all of the root elements children and creating an array of those whose name matches ‘message’.

pry(main)> messages = root.element_children.select { |node| node.name == 'message' }

This is pretty straightforward. The ‘element_children’ method gets a list of children for the node (the root in our case). Next we just loop through the results and create an array for anywhere that node.name is ‘message’. Lets take a quick look at the results.

pry(main)> puts messages
<message name="GetEndorsingBoarderRequest">
<part name="body" element="esxsd:GetEndorsingBoarder"/>
</message>
<message name="GetEndorsingBoarderResponse">
<part name="body" element="esxsd:GetEndorsingBoarderResponse"/>
</message>
=> nil

Success! I hope this effectively covered some of the basic pieces of XML and gave a useful intro to parsing XML with nokogiri. For a slightly outdated but great tutorial on getting started with nokogiri check out the bastards book of ruby


1477 Words

2015-08-04 12:34 -0700