How to connect to an API and parse XML (and why you would want to)

Many beginner programmers see the acronym API all over the place. Why are API's everywhere? What do you do with them? How do they work? At the same time, many beginner programmers see or encounter XML. Why is XML everywhere? How do you turn XML into the integers or strings that I know how to deal with? These are excellent questions that aspiring programmers may ask themselves. For me it was difficult to grasp the big picture and see exactly why these two acronyms were talked about so often in the programming world. In this article I'll explain what API's are, why XML is so often associated with them, and at the end give a short example of how to "connect" to an API, grab some XML from it, and parse it to turn it into the integers or strings that you probably know how to manipulate on a regular basis.

So what is an API (besides an Application Programming Interface)?

Imagine you worked for a large company named Word Co. that organized words, specifically English language words. Perhaps your company scanned a bunch of textbooks and collected all of the words, counted the words, and created a big database full of useful information related to words. Basically you have a big set of information and one day your company (Word Co.) decides it wants to make all of that data available for other companies or allow individuals to see or access it. What are your options?

  • Give people the actual database
  • Make a website that pulls from the database
  • Make an API that allows programmers to interact with the database

The first option is probably not a good one because the database can be huge (potentially gigabytes or terabytes of information), you may be using a proprietary database (such as Google's BigTable) or software, or maybe you just spent millions of dollars collecting this information and you want to charge people for accessing it.

The second option may be a really neat idea but might not work if you wanted a mobile device or app to access it, or if you wanted to present the information in a different way other than a chart or web form. Imagine if someone wanted to make a Hangman game where you try to guess a random word (maybe a random word that was pulled from the big database of English language words) before a stick figure is "hung". This is something the website cannot directly perform.

An API allows people to grab information (or use services) that are part of a huge data set in ways that might not be imagined by the people who created that large data set. If  Word Co. organized English words and created an API to access those words, let's take a minute to imagine what others can create with it:

Which are all applications or tools that Word Co. doesn't have the time or desire to create. API's are usually intended to allow third parties to create awesome things using existing data that a company has already harvested and collected. What are some other services that might have API's?

  • Weather services usually have API's
  • Google has a ton of API's (like their Maps API, their search engine, and just about everything else)
  • Facebook allows third-party developers to interact with the Facebook data
  • Twitter
  • Nearly everything else

So how do I "connect to" or use an API?

Although many API's are different, it often boils down to making a request and getting some data. Some API's give you a bunch of code or libraries that you add to your project, and then use that code to make the requests, but many other API's are quite simple. If you are new to programming, I'd suggest looking for REST or so called "RESTful" API's. Other ways to access API's such as SOAP also exist, but in my opinion are a little harder to get started with. Fortunately many API's that used to be SOAP based are now REST based. Let's outline how you would use a typical REST based API:

  1. Make an HTTP request to a web server. Usually you'll include a variable or two that is passed in through the URL
  2. Get some data back (typically XML)
  3. Parse the XML (the XML is just a big character stream and you'll want to grab certain pieces of it and turn it into other data types or create an object)
  4. Use that data to do neat things! (Like create a Hangman game with a random word you just grabbed)

Notice that the data that comes back from an API is typically XML. Why XML? Because it's a great intermediary "language". Imagine if you wrote your Hangman game in Java and the Random Word API gave you Python code back. That wouldn't be very useful. Or if you wrote something in C/C++ and an API gave you a serialized Java object.

What makes XML so popular (especially with API's) is that it allows you to use whichever language you want, and gives you data is that both human readable and computer readable. Just about any programming language comes with standard libraries to parse XML quickly and easily. If you're an advanced programmer, it also allows you to build objects or data structures (like if you're dealing with A TON of data) exactly how you want them instead of forcing you to accept whatever the API gives you.

A concrete example in Java

Let's make something! Imagine you wanted to create your own Android weather app. Since we aren't meteorologists, we'll get all of the weather information from someone else-- Google's Weather API. Other options are the National Weather Service (in the U.S.) or maybe Weather Underground. Most of the API's out there are well documented and tell you how you should connect, use, or interface with them. Google's Weather API is a little weird in that there is no documentation. I think it's sort of a secret API. But here's how you use it:

  1. Make an HTTP request to http://www.google.com/ig/api?weather=Location where Location is whatever you want (A postal code or city).

That's it! You'll get a bunch of XML back with the current weather and forecast information. You can even try it out in your web browser (since your web browser makes HTTP requests on a very regular basis).  Let's see what happens when we use Seattle WA as an example (from http://www.google.com/ig/api?weather=Seattle+WA):

<xml_api_reply version="1">  
<weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0">
<forecast_information>
<city data="Seattle, WA"/>
<postal_code data="Seattle WA"/>
<latitude_e6 data=""/>
<longitude_e6 data=""/>
<forecast_date data="2011-09-29"/>
<current_date_time data="2011-09-29 17:53:00 +0000"/>
<unit_system data="US"/>
</forecast_information>
<current_conditions>
<condition data="Clear"/>
<temp_f data="62"/>
<temp_c data="17"/>
<humidity data="Humidity: 62%"/>
<icon data="/ig/images/weather/sunny.gif"/>
<wind_condition data="Wind: N at 4 mph"/>
</current_conditions>
<forecast_conditions>
<day_of_week data="Thu"/>
<low data="56"/>
<high data="72"/>
<icon data="/ig/images/weather/sunny.gif"/>
<condition data="Clear"/>
</forecast_conditions>
<forecast_conditions>
<day_of_week data="Fri"/>
<low data="56"/>
<high data="70"/>
<icon data="/ig/images/weather/mostly_sunny.gif"/>
<condition data="Partly Sunny"/>
</forecast_conditions>
<forecast_conditions>
<day_of_week data="Sat"/>
<low data="49"/>
<high data="65"/>
<icon data="/ig/images/weather/rain.gif"/>
<condition data="Showers"/>
</forecast_conditions>
<forecast_conditions>
<day_of_week data="Sun"/>
<low data="54"/>
<high data="65"/>
<icon data="/ig/images/weather/chance_of_rain.gif"/>
<condition data="Chance of Rain"/>
</forecast_conditions>
</weather>
</xml_api_reply>

And let's imagine we want to extract the highs and lows in this XML so we can use them in our Android weather app. As mentioned, many programming languages have built in libraries that allow you to parse the XML. Since XML is so popular, there are even multiple approaches to parsing it, even within a given language. Java has both a DOM parser and a SAX parser built in. Python also has a DOM parser and a SAX parser built in. What are DOM and SAX parsers?

  • SAX (Simple API for XML) parsers are stream oriented parsers and typically use less memory and are faster
  • DOM (Document Object Model) parsers are tree traversal parsers and can consume more memory if you're dealing with large amounts of XML

When should you use one over the other? When you are dealing with HUGE amounts of data. Most of the time (such as right now) you don't need to worry and can use whichever one you're comfortable with. I'll be using the Java SAX parser in this example.

Remember the steps to do this? 1) Make an HTTP request to the API, typically passing in a URL variable, 2) Get the data back and then parse it, and finally 3) Do neat things! Let's see what that looks like in Java code:

Weather.java (first draft)

import java.io.IOException;  
import java.io.InputStream;  
import java.net.URL;

public class Weather  
{

    public static final String URL_SOURCE = "http://www.google.com/ig/api?weather=";;


    public static void main(String[] args)
    {
        /*** Create the request ***/
        // Let's pick a location:
        String location = "Seattle, WA";
        // Create the URL:
        String query = URL_SOURCE + location;
        // Replace blanks with HTML-Equivalent:
        query = query.replace(" ", "%20");

        /***
         * Make the request (This needs to be in a try-catch block because things can go wrong)
         ***/
        try
        {
            // Turn the string into a URL object
            URL urlObject = new URL(query);
            // Open the stream (which returns an InputStream):
            InputStream in = urlObject.openStream();

            /** Now parse the data (the stream) that we received back ***/
            // Coming shortly since we need to set up a parser

        }
        catch(IOException ioe)
        {
            ioe.printStackTrace();
        }
    }
}

So at this point we have some simple Java code that connects to the Google Weather API and receives some data back. In the above case, we are getting our data (the XML) in the form of an InputStream. In other languages you'll still probably be receiving the data as a stream. Streams and I/O are a pretty big part of programming, so if you're not sure how to work with these, now is a good time to start. Anyways, we now need to set up the XML parser. As mentioned I am picking the SAX parser for this example, and as the SAX parser explains on its website, you need to create a handler for handling the XML. In other words, you need to tell it what to do when it encounters specific parts of the XML. In this case we'll look for <low>, <high>, and <dayofweek> tags. To define this behavior we'll extend SAX's DefaultHandler (meaning we give it more functionality than the default functionality). Let's see what this looks like:

GoogleHandler.java

public class GoogleHandler extends DefaultHandler  
{

    // Create three array lists to store the data
    public ArrayList<Integer> lows = new ArrayList<Integer>();
    public ArrayList<Integer> highs = new ArrayList<Integer>();
    public ArrayList<String> days = new ArrayList<String>();


    // Make sure that the code in DefaultHandler's
    // constructor is called:
    public GoogleHandler()
    {
        super();
    }


    /*** Below are the three methods that we are extending ***/

    @Override
    public void startDocument()
    {
        System.out.println("Start document");
    }


    @Override
    public void endDocument()
    {
        System.out.println("End document");
    }


    // This is where all the work is happening:
    @Override
    public void startElement(String uri, String name, String qName, Attributes atts)
    {
        if(qName.compareTo("day_of_week") == 0)
        {
            String day = atts.getValue(0);
            System.out.println("Day: " + day);
            this.days.add(day);
        }
        if(qName.compareToIgnoreCase("low") == 0)
        {
            int low = Integer.parseInt(atts.getValue(0));
            System.out.println("Low: " + low);
            this.lows.add(low);
        }
        if(qName.compareToIgnoreCase("high") == 0)
        {
            int high = Integer.parseInt(atts.getValue(0));
            System.out.println("High: " + high);
            this.highs.add(high);
        }
    }
}

And now that we have defined how the XML parser should behave, let's add in our GoogleHandler to the Weather code:

Weather.java (final draft)

import java.io.IOException;  
import java.io.InputStream;  
import java.net.URL;

import org.xml.sax.InputSource;  
import org.xml.sax.SAXException;  
import org.xml.sax.XMLReader;  
import org.xml.sax.helpers.XMLReaderFactory;

public class Weather  
{

    public static final String URL_SOURCE = "http://www.google.com/ig/api?weather=";;


    public static void main(String[] args)
    {
        /*** Create the request ***/
        // Let's pick a location:
        String location = "Seattle, WA";
        // Create the URL:
        String query = URL_SOURCE + location;
        // Replace blanks with HTML-Equivalent:
        query = query.replace(" ", "%20");

        /***
         * Make the request (This needs to be in a try-catch block because things can go wrong)
         ***/
        try
        {
            // Turn the string into a URL object
            URL urlObject = new URL(query);
            // Open the stream (which returns an InputStream):
            InputStream in = urlObject.openStream();

            /** Now parse the data (the stream) that we received back ***/

            // Create an XML reader
            XMLReader xr = XMLReaderFactory.createXMLReader();

            // Tell that XML reader to use our special Google Handler
            GoogleHandler ourSpecialHandler = new GoogleHandler();
            xr.setContentHandler(ourSpecialHandler);

            // We have an InputStream, but let's just wrap it in
            // an InputSource (the SAX parser likes it that way)
            InputSource inSource = new InputSource(in);

            // And parse it!
            xr.parse(inSource);

        }
        catch(IOException ioe)
        {
            ioe.printStackTrace();
        }
        catch(SAXException se)
        {
            se.printStackTrace();
        }
    }
}

Doesn't look so bad, does it? If you go ahead compile the two files (both Weather.java and GoogleHandler.java) you should be able to run it without any problems. Here's the output when I ran it:

Start document
Day: Thu
Low: 56
High: 72
Day: Fri
Low: 56
High: 70
Day: Sat
Low: 49
High: 65
Day: Sun
Low: 54
High: 65
End document

In the GoogleHandler there are System.out.println() commands, but it also adds the integers and strings into their own array lists which you can now access in a more familiar way (such as calling days.get(0) to get the first day of the week in that array list).

A concrete example in Python 3

And finally let's take a quick look at how to do this in Python, again using a SAX parser. As you can see, Python does quite a bit of heavy lifting for you (such as making the HTTP request and getting the XML -- which is one  line of code). Go ahead and copy/modify this code for any of your projects. It was built and tested with Python 3.2.2 in October 2011.

Weather.py

import urllib.request  
import xml.sax

# Create some lists to store the data:
lows = []  
highs = []  
days = []

# Define our special Google Handler that extends
# what the default content handler does
class GoogleHandler(xml.sax.ContentHandler):  
    def startElement(self, name, attrs):
        if name=="day_of_week":
            print("Day:", attrs['data'])
            days.append(attrs['data'])
        if name=="low":
            print("Low:", attrs['data'])
            lows.append(attrs['data'])
        if name=="high":
            print("High:", attrs['data'])
            highs.append(attrs['data'])

# Make an HTTP request at the specified URL
# and get back a bunch of XML
xmlResponse = urllib.request.urlopen('http://www.google.com/ig/api?weather=Seattle+WA';)

# Create a SAX Parser
parser = xml.sax.make_parser()  
# Tell the parser to use our special handler
parser.setContentHandler(GoogleHandler())  
# And parse the XML!
parser.parse(xmlResponse)

# Print out the lists:
print("Days:", days)  
print("Lows:", lows)  
print("Highs:", highs)

And let's see what sort of output we get when we run it:

Day: Thu
Low: 56
High: 72
Day: Fri
Low: 56
High: 70
Day: Sat
Low: 49
High: 65
Day: Sun
Low: 54
High: 65
Days: ['Thu', 'Fri', 'Sat', 'Sun']
Lows: ['56', '56', '49', '54']
Highs: ['72', '70', '65', '65']

I hope this tutorial was helpful. If you have questions please ask away. I'll also add that our fictional Word Co. (as mentioned at the top of this article) API isn't just a made up concept to explain API's. It actually exists!

Stephen

Read more posts by this author.


View or post Comments